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Abstract 

We  formalize  the  Dolev-Yao  model  of  security  protocols,  using  a  notation 
based  on  multi-set  rewriting  with  existentials.  The  goals  are  to  provide  a  simple 
formal  notation  for  describing  security  protocols,  to  formalize  the  assumptions 
of  the  Dolev-Yao  model  using  this  notation,  and  to  analyze  the  complexity 
of  the  secrecy  problem  under  various  restrictions.  We  prove  that,  even  for  the 
case  where  we  restrict  the  size  of  messages  and  the  depth  of  message  encryption, 
the  secrecy  problem  is  undecidable  for  the  case  of  an  unrestricted  number  of 
protocol  roles  and  an  unbounded  number  of  new  nonces.  We  also  identify 
several  decidable  classes,  including  a  DEXP-complete  class  when  the  number  of 
nonces  is  restricted,  and  an  NP-complete  class  when  both  the  number  of  nonces 
and  the  number  of  roles  is  restricted.  We  point  out  a  remaining  open  complexity 
problem,  and  discuss  the  implications  these  results  have  on  the  general  topic  of 
protocol  analysis. 


1  Introduction 

Protocols  based  on  cryptographic  primitives  are  commonly  used  to  protect  access 
to  computer  systems  and  to  protect  transactions  over  the  internet.  Two  well-known 
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examples  are  the  Kerberos  authentication  scheme  [KNT94,  KN93],  used  to  man¬ 
age  encrypted  passwords  on  clusters  of  interconnected  computers,  and  the  Secure 
Sockets  Layer  [FKK96],  used  by  internet  browsers  and  servers  to  carry  out  secure 
internet  transactions. 

Security  protocol  design  and  analysis  is  a  difficult  problem.  Some  of  the  diffi¬ 
culties  come  from  subtleties  of  cryptographic  primitives.  Further  difficulties  arise 
because  security  protocols  are  required  to  work  properly  when  multiple  instances 
of  the  protocol  are  carried  out  in  parallel,  where  a  malicious  intruder  may  combine 
data  from  separate  sessions  in  order  to  confuse  honest  participants.  Moreover,  al¬ 
though  the  protocols  themselves  are  often  very  simple,  the  security  properties  they 
are  supposed  to  achieve  are  rather  subtle  and  should  be  formulated  with  great  care. 
Many  security  protocols  have  been  published  with  subtle  flaws  that  may  be  traced 
to  insufficient  rigor  in  formulating  the  premises  about  capabilities  of  participants. 

In  the  literature  on  security  protocol  design  and  analysis,  protocols  are  commonly 
described  using  an  informal  notation  that  leaves  many  properties  of  a  protocol  un¬ 
specified.  For  example,  a  short  challenge-response  section  of  a  protocol  might  be 
written  as: 

A  — >  B  :  {n}K 
B  — >  A  :  {f(n)}K 

In  this  notation,  a  message  of  the  form  {x}y  consists  of  a  plaintext  x  encrypted 
with  key  y.  In  this  example  protocol,  Alice  chooses  a  random  number  n  and  sends 
its  encryption  to  Bob.  There  is  no  specific  indication  of  how  Bob  determines  what 
to  send  in  response,  but  we  can  see  that  Bob  returns  a  message  that  contains  the 
encryption  of  f{n).  By  analogy  with  familiar  protocols,  we  might  assume  that  he 
decrypts  the  message  he  receives  to  determine  n.  then  applies  f  to  n  and  returns 
the  result  to  Alice  (encrypted  with  the  same  key). 

As  written,  the  protocol  description  only  gives  an  intended  trace  or  family  of 
traces  involving  the  honest  principals.  There  is  no  standard  way  of  determining  the 
initial  conditions  or  assumptions  about  shared  information,  nor  can  we  see  how  the 
principals  will  respond  to  messages  that  differ  from  those  explicitly  written.  For 
example,  in  the  case  at  hand,  we  must  explain  in  English  that  K  is  assumed  to  be  a 
shared  key  and  that  n  is  generated  by  Alice.  Otherwise,  it  is  a  perfectly  reasonable 
interpretation  of  the  two  lines  above  that  Alice  and  Bob  initially  share  a  number 
n.  In  this  case,  Alice  might  send  {n}K  to  Bob,  with  Bob  returning  {.f(n)} k  to 
Alice  only  if  he  receives  precisely  {u}k-  While  the  two  readings  of  the  protocol  give 
the  same  sequence  of  messages  when  no  one  interferes  with  network  transmission, 
the  effects  are  different  if  an  intruder  intercepts  the  message  from  Alice  to  Bob 
and  replaces  it  with  another  message.  Hence  it  seems  fair  to  say  that  the  notation 
commonly  found  in  the  literature  does  not  provide  a  rigorous  basis  for  security 
protocol  analysis. 

In  recent  years,  a  variety  of  methods  have  been  developed  for  analyzing  and 
reasoning  about  security  protocols.  These  approaches  include  specialized  logics  such 
as  BAN  logic  [BAN89],  special-purpose  tools  designed  for  cryptographic  protocol 
analysis  [KMM94],  as  well  as  theorem-proving  [Pau97a,  Pau97b]  and  mo  del- checking 
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methods  using  general  purpose  tools  [Low96,  Mea96,  MMS97,  Ros95,  Sch96,  C.JM98, 
SS98]. 

Although  there  are  many  differences  among  these  approaches,  most  current  for¬ 
mal  approaches  use  the  same  basic  model  of  adversary  capabilities,  which  appears 
to  have  developed  from  positions  taken  by  Needham  and  Schroeder  [NS78]  and  a 
model  presented  by  Dolev  and  Yao  [DY83].  In  this  idealized  setting,  a  protocol  ad¬ 
versary  is  allowed  to  nondeterministically  choose  among  possible  actions.  Messages 
are  composed  of  indivisible  abstract  values,  not  sequences  of  bits,  and  encryption  is 
modeled  in  an  idealized  way.  The  adversary  may  only  send  messages  comprised  of 
data  it  “knows”  as  the  result  of  overhearing  past  transmissions. 

The  Dolev- Yao  abstraction  makes  symbolic  reasoning  about  cryptographic  pro¬ 
tocols  a  viable  approach.  Perhaps  the  simplest  approach  in  this  regard  is  to  con¬ 
sider  protocols  as  a  form  of  rewriting,  so  that  protocol  execution  could  be  carried 
out  symbolically.  This  observation  was  sharpened  to  a  rigorous,  formal  definition 
of  the  Dolev- Yao  model  by  means  of  multiset  rewriting  with  existential  quantifica¬ 
tion,  MSR,  introduced  in  [Mit98,  DM99b,  CDL+99].  In  addition  to  rewriting  to 
effect  state  transitions,  we  also  needed  a  way  to  choose  new  values,  such  as  nonces 
or  keys.  While  this  seems  difficult  to  achieve  directly  in  standard  rewriting  for¬ 
malisms,  the  proof  rules  associated  with  existential  quantification  appear  to  be  just 
what  is  required.  Therefore,  we  have  adopted  a  notation  that  may  be  regarded  as  an 
extension  of  multiset  rewriting  (see,  e.g.,  [BM93,  BB90])  with  existential  quantifica¬ 
tion.  This  formalism  is  quite  palatable  and  quite  close  to  the  informal,  traditional 
way  of  describing  protocol  message  exchange,  described  above.  Since  its  inception 
in  [DM99b,  CDL+99],  the  MSR  formalism  has  been  applied  and  extended  in  sev¬ 
eral  ways.  MSR  has  been  incorporated  into  a  high-level  specification  language  for 
authentication  protocols,  CAPSL  [DM99a].  A  typed  version  of  MSR  is  studied 
in  [CerOlcj.  MSR  has  been  successfully  applied  in  the  analysis  of  widely  used  proto¬ 
cols  such  as  Kerberos  5  [BCJS02],  MSR  is  used  as  a  formal  setting  for  a  game-based 
analysis  of  contract-signing  protocols  in  [RC01]. 

The  importance  of  existential  quantification,  for  security  protocols,  is  that  it 
provides  a  direct  mechanism  for  choosing  a  new  value  that  is  different  from  other 
values  used  in  the  execution  of  a  system.  Since  many  protocols  involve  choosing  fresh 
nonces,  fresh  encryption  keys,  and  so  on,  existential  quantification  seems  like  a  useful 
primitive  for  describing  security  protocols.  While  existential  quantification  does  not 
semantically  imply  there  exist  “new”  values  with  certain  properties,  standard  proof 
rules  for  manipulating  existential  quantifiers  require  introduction  of  fresh  symbols 
(sometimes  called  Skolem  constants).  The  way  that  existential  quantification  is  used 
in  our  formalism  is  based  on  the  standard  existential  elimination  rule  from  natural 
deduction.  If  we  have  an  existentially  quantified  axiom,  3x.cp,  then  this  rule  says 
that  if  we  wish  to  prove  some  formula  ip,  we  can  choose  a  new  symbol  y  for  the  "x 
that  is  presumed  to  exist”  and  proceed  to  derive  ip  from  [y/x\(p.  The  side  condition 
lly  not  free  in  any  other  hypothesis  in  the  proof  of  ip ”  means  that  the  only  hypothesis 
in  the  proof  of  ip  that  can  contain  y  is  the  hypothesis  [y/x\(p. 

Our  multiset  rewriting  framework  with  existential  quantification  (MSR)  may 
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also  be  viewed  as  the  existential  Horn  fragment  of  first-order  linear  logic  [Gir87a]. 
The  close  connection  between  standard  multiset  rewriting  (without  existential  quan¬ 
tification)  and  simple  fragments  of  linear  logic  has  been  studied  extensively  [Asp87, 
MOM91,  GG90b,  Kan94]  and  extended  in  [CDKSOO]  to  include  parameters  and  ex¬ 
istential  quantification.  Under  this  correspondence,  every  MSR  transition  sequence 
corresponds  to  a  linear  logic  derivation  in  normal  form,  and  conversely. 

A  linear  logical  framework  automated  tool  LLF  [CP96]  may  be  used  to  simulate 
the  execution  of  protocols,  detect  attacks,  and  construct  formal  proofs  about  proto¬ 
col  transformations  [CDL+99].  A  similar  fragment  of  linear  logic  is  used  in  [KOS98] 
as  a  basis  for  a  specification  language  for  real-time  systems.  Linear  logic  is  also  used 
to  model  the  state-transition  aspect  of  protocols,  but  not  existential  quantification 
for  nonces,  in  [CD98,  DMT98]. 

As  presented  in  [DM99b,  CDL+99],  a  protocol  theory  consists  of  three  parts:  a 
bounded  phase  describing  protocol  initialization  that  distributes  keys  or  establishes 
other  shared  information,  a  role  generation  theory  that  designates  possibly  multiple 
roles  that  each  principal  may  play  in  a  protocol  (such  as  initiator,  responder,  client, 
or  server),  and  a  disjoint  union  of  bounded  subtheories  that  each  characterize  a 
possible  role.  Encryption  is  typed,  which  prevents  arbitrarily  nested  encryption 
terms.  These  syntactic  restrictions,  which  are  discussed  in  detail  in  the  first  part  of 
the  present  paper,  make  it  possible  to  distinguish,  in  precise  terms,  protocols  from 
general  rewrite  systems.  This  particular  feature  of  the  MSR  formalism  is  a  novel 
contribution  to  security  protocol  analysis;  it  seems  to  have  no  counterpart  in  the 
richer  formalisms  such  as  [AG99].  Furthermore,  this  feature  of  the  MSR  formalism 
allows  us  to  identify  two  important  parameters  of  a  protocol  itself:  the  number 
of  roles  and  the  number  of  new  data  (such  as  nonces  or  keys)  introduced  by  the 
protocol. 

Using  our  precise  form  of  protocol  theory,  in  the  second  part  of  the  paper  we 
discuss  in  detail  several  decidability  and  complexity  results  regarding  the  secrecy 
property  for  protocols,  most  of  which  were  established  in  [DM99b,  CDL+99].  Infor¬ 
mally,  a  protocol  satisfies  secrecy  if  some  privileged  information  (fixed  in  advance) 
will  never  be  released  to  the  adversary.  In  MSR  this  property  may  be  stated  as 
unreachability :  global  configuration  in  which  the  intruder  is  in  possession  of  the 
specified  secret  is  not  reachable  by  protocol  execution  steps.  Hence  the  failure  of  se¬ 
crecy  is  stated  as  reachability.  We  show  that  secrecy  is  an  undecidable  property  even 
if  data  constructors,  message  depth,  message  width,  number  of  distinct  roles,  role 
length,  and  depth  of  encryption  are  bounded  by  constants.  DEXPTlME-completeness 
of  the  failure  of  secrecy  is  shown  for  protocols  further  restricted  to  allow  only  a 
fixed  number  of  new  data.  Furthermore,  NP-completeness  of  the  failure  of  secrecy 
is  shown  for  protocols  restricted  even  further  to  have  a  fixed  number  of  roles.  The 
latter  upper  bound  has  been  recently  strengthened  considerably,  namely  that  it  is 
in  NP  without  any  bound  on  message  size,  in  [RT01,  ALV02], 

In  some  ways,  undecidability  might  not  be  expected  for  protocols.  The  reason 
is  that  there  is  only  a  finite  number  of  possible  messages,  except  for  the  unbounded 
number  of  new  nonces  that  repeated  runs  of  a  protocol  might  generate.  However, 
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our  undecidability  proof  shows  that  nonces  may  be  used  as  a  form  of  “pointer,” 
linking  together  messages  that  contain  only  simple  data.  However  innocuous  they 
may  seem,  nonces  are  at  the  heart  of  the  problem  in  analyzing  this  class  of  se¬ 
curity  protocols.  In  the  undecidability  proof,  the  intruder  stores  encryptions  of 
all  atomic  formulas  derivable  from  a  given  existential  Horn  theory  without  func¬ 
tion  symbols,  replaying  these  messages  as  needed  in  order  for  the  protocol  steps  to 
carry  out  an  arbitrary  deduction.  The  undecidability  of  the  implication  problem  for 
existential  Horn  clauses  without  function  symbols  follows  from  [CLM81]  and  may 
also  be  obtained  directly  by  axiomatizing  a  Cook’s-theorem-style  Turing  machine 
tableau  [DLMS99].  If  protocols  are  further  restricted  to  generate  no  new  data  during 
execution,  then  DEXPTlME-hardness  follows  by  the  same  encoding  of  Horn  formulas 
(Datalog  programs)  as  in  our  undecidability  proof,  applied  to  Horn  clauses  without 
function  symbols  and  without  existential  quantification.  For  these  Horn  theories, 
DEXPTlME-hardness  of  the  implication  problem  (measured  as  a  function  of  the  size 
of  the  theory)  is  implicit  in  [Imm86,  Var82],  as  explained  in  [DEGV97]. 

Multiset  rewriting  formalism  (MSR)  is  presented  in  Section  2  by  means  of  several 
increasingly  complex  examples.  In  Section  3  we  show  how  to  represent  security 
protocol  theories  in  MSR,  introducing  the  modeling  of  nonces,  roles,  the  intruder, 
and  encryption.  A  detailed  example  of  the  Needham-Schroeder  Public  Key  Protocol 
in  MSR  is  in  Section  4.  In  Section  5  we  show  complexity  results  for  security  protocols 
under  various  restrictions.  In  Section  6  we  show  some  examples  of  protocols  that 
demonstrate  some  of  the  lower  bounds  in  a  more  practical  setting.  In  Section  7  we 
discuss  related  work,  and  finally  in  Section  8  we  present  conclusions. 

2  Multiset  Rewriting  with  Existential  Quantification 

2.1  Protocol  Notation 

We  introduce  a  formalism  for  describing  a  class  of  nondeterministic  infinite-state 
systems.  The  formalism  is  similar  in  many  respects  to  standard  rewrite  systems 
[Klo87,  Mit96],  with  two  main  differences.  The  first  is  that  instead  of  representing 
information  by  a  single  expression,  we  use  multisets  of  first-order  atomic  formulas. 
(A  multiset  is  similar  to  a  set,  but  with  counting  of  duplicates.)  The  second  main 
difference  is  that  the  formalism  has  a  basic  mechanism  for  choosing  “new”  sym¬ 
bols.  This  is  important  for  modeling  protocols  that  choose  a  new  nonce  or  generate 
encryption  keys. 

Our  formalism  can  also  be  viewed  as  a  Horn  fragment  of  linear  logic  [Gir87b, 
Asp87,  MOM89,  GG90a,  Kan94,  Cer95].  A  similar  fragment  of  linear  logic  is  used 
in  [KOS98]  to  represent  real-time  finite-state  systems.  Two  other  efforts  using  linear 
logic  to  model  the  state-transition  aspect  of  protocols  (but  not  existential  quantifi¬ 
cation  for  nonces)  are  [CD98,  DMT98]. 

The  multiset  rewriting  notation  is  also  related  to  the  Chemical  Abstract  Machine 
formalism  [BB90],  with  the  primary  difference  being  the  addition  of  existentials. 

The  syntax  involves  terms,  facts  and  rules.  If  we  want  to  represent  a  system  in 
this  formalism,  we  begin  by  choosing  a  vocabulary,  or  first-order  signature.  This  is 
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a  standard  notion  from  multi-sorted  first-order  logic  [End72], 

Signatures  A  first-order  signature  consist  of  a  set  of  sorts,  together  with  function 
symbols  and  predicate  symbols  with  specific  sorts.  The  sorts  indicate  the  kinds  of 
data  that  will  be  used  in  the  model.  For  example,  the  sorts  used  in  a  protocol 
may  be  key,  msg,  nonce  for  encryption  keys,  message  contents  and  nonces.  Function 
symbols  are  names  for  functions  on  the  sorts  of  the  signature.  For  example,  an 
encryption  function  might  have  sort 

encrypt  :  key  x  msg  — >  cipher 

where  cipher  is  the  sort  for  ciphertexts  (encrypted  text).  In  any  signature,  each 
function  symbol  must  have  a  fixed  set  of  parameter  sorts  (one  for  each  function  ar¬ 
gument)  and  a  result  sort.  A  function  with  no  arguments  is  called  a  constant  symbol. 
Finally,  a  multi-sorted  first-order  signature  has  a  set  of  predicate  symbols,  each  with 
a  fixed  set  of  parameter  sorts.  It’s  also  possible  to  consider  order-sorted  signatures 
[Gog78]  and  in  fact  we  will  find  this  convenient  for  specifying  some  example  protocol 
theories  in  our  formalism  in  Section  3. 

Terms  The  terms  over  a  signature  are  the  set  of  expressions  formed  by  applying 
functions  to  arguments.  In  each  case,  a  function  must  be  applied  to  arguments  of  the 
correct  sort.  For  example,  if  f  :  s  — >  t  and  x  :  s.  then  f (x)  is  a  well-formed  term  since 
the  argument  sort  of  f  matches  the  sort  of  x.  As  suggested  by  this  example,  terms 
may  contain  variables,  but  each  variable  must  have  an  associated  sort.  A  variable  is 
not  allowed  to  be  used  with  different  sorts  in  different  expressions  associated  with 
the  same  system.  (All  of  this  can  be  formalized  using  an  inductive  definition  of  the 
well-formed  terms  and  their  sorts,  but  we  assume  that  most  readers  will  be  familiar 
with  these  standard  concepts  from  logic.) 

Facts  A  fact  is  a  ground  (i.e.  variable-free)  first-order  atomic  formula.  This  means 
that  a  fact  is  the  result  of  applying  a  predicate  symbol  to  ground  terms  of  the  correct 
sorts. 

States  A  state  is  a  multiset  of  facts  (all  over  the  same  signature).  In  this  paper 
we  are  only  concerned  with  finite  multisets. 

Rules  State  transitions  are  written  using  two  multisets  of  atomic  formulas,  in  the 
following  syntactic  form: 

A] ,  ■ .  • ,  l'\-  t  3^1 . . .  . .  • ,  Gn 

The  meaning  of  this  rule  is  that  if  state  S  contains  facts  oF\, . . .  aF ).  for  some  ground 
substitution  a,  then  one  possible  next  state  is  the  state  S'  that  is  similar  to  S,  but 
with: 


•  facts  aF\, . . . ,  aF \  removed, 
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•  oG i, . . .  a Gn  added,  where  substitution  o  replaces  x\ . . .  Xj  by  new  constant 
symbols. 

While  existential  quantification  does  not  semantically  imply  there  exist  “new” 
values  with  certain  properties,  standard  proof  rules  for  manipulating  existential 
quantifiers  require  introduction  of  fresh  symbols  (sometimes  called  Skolem  con¬ 
stants),  as  described  in  Section  2.3. 

If  there  are  free  variables  in  the  rule  F\, . . — >  3x\ . . .  Bxj.G i, . . . ,  Gn,  these 
are  treated  as  universally  quantified.  In  an  application  of  a  rule,  these  free  variables 
may  be  replaced  by  any  terms. 

For  example,  consider  the  state 

5  =  {P(f(a)),P(b)} 

and  rule 

P(z)  — ►  P(f(*)). 

Then  one  possible  next  state  is  obtained  by  using  the  substition  a  =  [x  xx  f (a)] , 
instantiating  the  rule  to 

P(f  (a))  — ►  P(f(f(a))). 

With  this  substitution,  we  can  remove  P(f(a))  from  S  and  obtain  the  next  state 
S'  =  {P(f(f(a))),P(b)}. 

We  can  then  use  a  different  instance  of  the  rule,  with  substitution  a'  =  [x  ^  b], 

P(b)  — ►  P(f(b)) 

to  reach  state  S"  =  {P(f (f (a))),  P (f ( b) )} .  It  is  also  possible  to  reach  S"  from  S  by 
performing  these  replacements  in  the  opposite  order. 

If  a  function  is  invertible,  then  this  can  also  be  expressed  as  a  rule.  For  example, 
the  rule 

p(f(®))  — »  Q(*) 

involves  recovering  the  data  x  from  f(rr).  We  will  use  rules  of  this  form  to  describe 
decryption  of  encrypted  messages. 

An  MSR  Theory  is  defined  by  a  signature  and  a  set  of  rules.  Given  an  MSR 
Theory  and  a  state  there  is  a  set  of  traces ,  with  each  state  reached  from  the  previous 
one  by  applying  one  of  the  rules  from  the  theory. 

2.2  Example:  Finite  Automata 

As  a  first  example,  without  existential  quantification,  we  describe  a  method  for 
presenting  finite-state  automata  in  this  notation.  Assuming  we  have  some  specific 
automaton  A,  we  choose  a  vocabulary  for  describing  the  input  tape  and  the  states 
of  the  automaton,  and  we  have  a  rule  corresponding  to  each  state  transition  of  the 
automaton.  Each  rule  consumes  an  input  and  moves  to  the  next  state.  The  rules 
will  depend  on  the  specific  automaton  A,  but  the  basic  method  can  be  applied  to 
any  automaton. 
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Sorts  Given  an  automaton  A,  the  signature  for  the  theory  Th(A)  has  three  sorts: 
st  for  automaton  states,  symb  for  input  symbols  and  string  for  lists  of  input  symbols. 

Predicates  We  use  predicates  to  represent  the  automaton  state,  and  its  current 
input  string. 

State  :  st  current  state 
Input  :  string  current  input  string 

Functions  We  use  the  cons  function  to  represent  concatenation  of  strings, 

cons  :  symb  x  string  — >  string 
For  simplicity,  we  will  write  a-x  for  cons(a,a:). 

Constants  We  need  names  for  the  states  of  automaton  A,  and  names  for  the 
symbols  of  the  input  alphabet,  as  well  as  a  name  for  the  empty  string. 

qo ,  q i ,  q2i  •  •  •  :  st  finite  set  of  states 
a,  b  :  symb  input  alphabet 

nil  :  string  empty  string 

Rules  There  is  one  rule  for  each  state  transition  of  A.  For  example,  here  are  some 
rules  describing  possible  transitions  between  states  qo  and  qi: 

State(qo),  lnput(a-:r)  — >  State(qi),  lnput(a;) 

State(qo),  lnput(b-a:)  — >  Stated),  lnput(a:) 

State(qi),  lnput(a-a:)  — >  Stated),  Input)#) 

State(qi),  lnput(b-#)  — >  State(qo),  Input)#) 

A  sample  derivation  gives  us  automata  state  transitions  from  qo  to  qi  and  back  on 
input  a  •  b  •  nil.  We  will  write  this  out  as  a  sequence  of  states,  starting  with  the 
multiset  (State(qo),  lnput(a  •  b  •  nil)}  that  represents  the  automaton  in  state  qo  with 
input  string  a  •  b  •  nil. 

{State(qo),  lnput(a  •  b  •  nil)}  — >  {State(qi),  lnput(b  •  nil)} 

— >  {State(qo),  Input(nil)} 

It  should  be  easy  to  see  that  if  we  begin  with  a  system  state  consisting  of  one  fact 
about  the  state  of  the  automaton,  and  one  fact  about  the  input  string,  then  we  can 
only  reach  other  system  states  of  this  form.  In  particular,  we  can  never  reach  a 
system  state  where  the  automaton  is  in  two  states. 
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2.3  Existential  Quantification 

It  is  possible  to  give  finite  descriptions  of  infinite-state  systems  using  function  sym¬ 
bols.  For  example,  if  we  have  0  :  nat  and  sue  :  nat  — >  nat,  then  we  can  write 
expressions  for  arbitrarily  many  natural  numbers.  If  each  system  state  has  a  natu¬ 
ral  number,  then  we  will  have  infinitely  many  possible  system  states. 

Existential  quantification  provides  an  alternate  way  of  expressing  infinitely  many 
possible  states.  As  we  will  see  in  Section  3,  the  importance  of  existential  quantifi¬ 
cation,  for  security  protocols,  is  that  it  provides  a  direct  mechanism  for  choosing 
a  new  value  that  is  different  from  other  values  used  in  the  execution  of  a  system. 
Since  many  protocols  involve  choosing  fresh  nonces,  fresh  encryption  keys,  and  so 
on,  existential  quantification  seems  like  a  useful  primitive  for  describing  security 
protocols. 

The  way  that  existential  quantification  is  used  in  our  formalism  is  based  on  the 
existential  elimination  rule  from  natural  deduction.  This  proof  rule  is  commonly 
written  as  follows. 


[c/x\</> 

.  Bx.tti  ip  c  does  not  occur  in  any 

(3  elim)  - - - - -  ,  ,  .  J 

ip  other  hypothesis 

If  we  have  an  existentially  quantified  axiom,  3 x.(p,  then  this  rule  says  that  if  we  wish 
to  prove  some  formula  ip.  we  can  choose  a  new  constant  symbol  c  for  the  ax  that  is 
presumed  to  exist”  and  proceed  to  derive  ip  from  [c/x\cp.  The  side  condition  “c  does 
not  occur  in  any  other  hypothesis  in  the  proof  of  ijr’  means  that  the  only  hypothesis 
in  the  proof  of  ip  that  can  contain  c  is  the  hypothesis  [c/x\(p. 

2.4  Example:  Turing  Machine 

We  can  see  how  existential  quantification  allows  us  to  describe  infinite-state  systems 
by  axiomatizing  a  Turing  machine.  This  construction  shows  that  MSR  theories  with 
existentials  are  undecidable.  Later  (in  Appendix  A),  we  will  use  other  encodings 
of  Turing  machines  to  prove  the  undecidability  and  complexity  lower  bounds  for 
security  protocols  in  the  presence  of  an  attacker.  Because  of  the  attacker  and  the 
details  of  the  protocols,  the  encoding  used  in  those  examples  will  be  different. 

Let  us  assume  we  have  some  specific  Turing  machine  M.  We  choose  a  vocabu¬ 
lary  for  describing  states  of  the  machine  and  its  input,  and  write  rules  to  decribe 
transitions  according  to  machine  state  and  input.  The  rules  will  depend  on  the 
specific  machine  M,  but  the  basic  method  can  be  applied  to  any  Turing  machine. 

Sorts  The  signature  for  this  theory  has  three  sorts:  state  for  the  Turing  machines 
states,  cell  for  the  cell  names,  and  symbol  for  the  cell  contents. 
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Predicates  The  first  predicate  used  in  this  example  is  used  to  describe  the  current 
machine  state  and  tape  position.  The  other  two  predicates  describe  the  contents  of 
a  tape  cell  and  the  order  (adjacency)  between  cells. 

Curr  :  state  x  cell  current  state,  tape  pos. 

Cont  :  cell  x  symbol  contents  of  cell  is  symbol 
Adj  :  cell  x  cell  keep  cells  in  order 

Constants  We  also  need  names  for  the  states  of  the  machine  M,  names  for  the 
cells  at  the  beginning  and  end  of  the  tape,  and  names  for  the  symbols  that  may 
appear  on  the  tape  (0,1,  and  blank).  The  reason  we  have  an  end-of-tape  cell,  ceot, 
is  that  we  will  represent  an  unbounded  tape  by  including  rules  that  will  allow  us  to 
allocate  as  many  tape  cells  as  needed.  In  other  words,  we  will  represent  the  Turing 
machine  tape  by  explicitly  constructing  the  finite  list  of  cells  that  the  machine  has 
looked  at,  one  at  a  time. 

qo ,  qi,  q2,  •  •  •  :  state  finite  set  of  states 
Co,  Ci, ... ,  ceot  :  cell  initial  tape  cells 
0,1,  □  :  symbol  tape  symbols 

Rules  There  are  three  classes  of  transition  rules:  tape  maintenance  rules,  transi¬ 
tion  rules  that  correspond  to  Turing  machine  moves  that  move  the  head  to  the  right, 
and  transition  rules  that  correspond  to  moving  the  head  left.  Initially,  we  will  start 
the  machine  with  two  tape  cells,  the  leftmost  cell  Co  and  the  rightmost  end-of-tape 
cell  Ceot-  This  is  expressed  by  the  fact 

Adj ( C0,  Ceot) 

At  any  step  in  the  computation,  we  can  apply  the  tape  maintenance  rule 

Adj(c,  Ceot)  — »  3c'.Adj(c,  cr),  Cont(cr,  □  ),  Adj(c',  ceot) 

Informally,  this  rule  “says”  that  if  cell  c  is  adjacent  to  the  end-of-tape  cell,  then  we 
can  allocate  a  new  cell  c!  and  place  c'  between  c  and  the  end-of-tape  cell.  The  new 
cell  will  be  blank.  An  example  computation  below  shows  how  this  rule  can  be  used. 

The  rules  for  the  actual  moves  of  the  Turing  machine  will  depend  on  the  structure 
of  the  specific  machine  M  we  wish  to  represent.  Suppose  that  Turing  machine  M 
moves  to  the  right,  if  it  is  in  state  qt  with  symbol  0  on  the  tape  cell  currently  under 
the  tape  head.  If  the  move  of  M,  in  this  case,  is  to  state  (jj ,  writing  1  into  the  tape 
cell,  we  will  have  a  rule  of  the  following  form: 

Curr (qj,  c),  Cont(c,  0),  Adj(c,  c')  — >  Curr (qj,  c'),  Cont(c,  1),  Adj(c,  c') 

If,  instead  of  moving  right,  the  machine  would  move  left  in  this  case,  we  would 
instead  have  the  transition  rule 

Curr(g,;,  c),  Cont(c,  0),  Adj(c7,  c)  — >  Curr  (qj,  c'),  Cont(c,  1),  Adj(cr,  c) 

Note  that  moving  to  the  right,  we  assume  that  there  is  a  tape  cell  to  the  right  of  the 
tape  head.  This  assumption  can  be  satisfied  by  using  the  tape  maintenance  rule  if 
needed  before  executing  this  Turing  machine  move. 
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Sample  computation  Although  the  exact  moves  will  depend  on  the  specific  Tur¬ 
ing  machine  that  is  represented  by  this  method,  we  can  illustrate  the  use  of  exis¬ 
tential  quantification  by  showing  some  example  moves  of  a  sample  machine.  Let  us 
consider  a  Turing  machine  that  reads  the  input  tape  until  two  consecutive  0’s  are 
read,  then  inserts  a  1  after  the  both  of  them  and  moves  to  the  left  over  the  second  0. 
Suppose  that  the  machine  starts  in  state  qo  and  remains  in  this  state  until  it  reaches 
a  cell  containing  0.  At  that  point,  the  machine  changes  to  state  qi  to  “remember 
that  it  has  seen  a  0”  and  moves  right.  Then  the  first  few  moves  of  the  machine  on 
input  100  might  appear  as  follows: 

Curr(qo,  Co),  Cont(co,  1),  Cont(ci,  0),  Cont(c2,  0), 

Adj(c0,ci),  Adj(ci,c2),  Adj(c2,Ceot) 

— >  Curr(qo,  Ci),  Cont(co,  1),  Cont(ci,  0),  Cont(c2,  0), 

Adj(c0,  ci),  Adj(ci,  c2),  Adj(c2,  ceot) 

— >  Curr(qi,  c2),  Cont(co,  1),  Cont(ci,  0),  Cont(c2,  0), 

Adj(c0,ci),  Adj(ci,c2),  Adj(c2,ceot) 

At  this  point,  the  appropriate  transition  will  be  a  move  to  the  right  on  to  the  next 
cell,  where  the  machine  will  write  a  1.  However,  this  would  place  the  tape  head  over 
the  special  “end-of-tape”  marker.  Since  we  would  like  the  machine  to  proceed  as  if 
the  tape  were  infinite,  we  must  use  the  tape  maintenance  rule 

Adj(c,  ceot)  — »  3c'.Adj(c,  c7),  Cont(c7,  □),  Adj(c7,  ceot) 

to  insert  a  new  cell  in  front  of  the  end-of-tape  cell.  This  gives  us  the  transition 

Curr(qi,  c2),  Cont(co,  1),  Cont(ci,  0),  Cont(c2, 0), 

Adj(c0,  ci),  Adj(ci,  c2),  Adj(c2,  Ceot) 

— Curr(qi,  c2),  Cont(co,  1),  Cont(ci,  0),  Cont(c2,  0),  Cont(c3,  □), 

Adj(c0,  ci),  Adj(ci,  c2),  Adj(c2,  c3),  Adj(c3,  ceot) 

which  inserts  a  new  blank  cell,  c3  in  front  of  the  end-of-tape  marker.  Then  the  ma¬ 
chine  head  may  be  moved  right  over  the  new  blank  square  and  write  a  1.  Although 
it  would  be  possible  to  apply  a  transition  rule  moving  the  machine  head  over  the 
end-of-tape  marker,  the  end-of-tape  marker  has  no  contents.  Since  each  machine 
move  requires  a  tape  cell  with  some  contents  (possibly  including  the  blank  □),  a 
derivation  that  places  the  Turing  machine  head  over  the  end-of-tape  marker  will 
“hang”  the  machine  and  have  no  effect  on  the  set  of  accepting  computations. 

In  this  example  we  have  used  existential  quantification  to  avoid  the  unbounded 
use  of  function  applications.  It  would  be  possible  to  implement  a  Turing  machine 
without  existentials  by  using  skolem  functions. 

2.5  Creation,  Consumption,  Persistence 

Some  preliminary  definitions  involve  the  ways  that  a  fact  may  be  created,  preserved, 
or  consumed  by  a  rule.  While  multiple  copies  of  some  facts  may  be  needed  in  some 
derivations,  we  are  able  to  eliminate  the  need  for  multiple  copies  of  certain  facts. 
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Definition  2.1.  Assume  T  is  a  theory  and  P  is  a  predicate.  Any  rule  has  the 
form  l  — >  r,  where  l  is  the  facts  F\, . . . ,  F/,  on  the  left  hand  side,  and  r  is  the  facts 

G i , . ,  Gn,  possibly  with  one  or  more  existential  quantifiers,  on  the  right  hand  side. 

A  rule  in  a  theory  T  creates  P  facts  if  some  P(t)  occurs  more  times  in  r  than  in 
l.  A  rule  in  a  theory  T  preserves  P  facts  if  every  P(t)  occurs  the  same  number  of 
times  in  r  and  l.  A  rule  in  a  theory  T  consumes  P  facts  if  some  fact  P(t)  occurs 
more  times  in  l  than  in  r.  A  predicate  P  in  a  theory  T  is  persistent  if  every  rule  in 
T  which  contains  P  either  creates  or  preserves  P  facts. 

As  an  example,  a  rule  of  form 


Q(x)  — >  Q(y) 

does  not  preserve  Q  facts,  since  it  can  be  used  to  create  a  fact  Q(t)  and  consume  a 
fact  Q{s). 

Since  a  persistent  fact  is  never  consumed  by  any  rule,  there  is  no  need  to  generate 
more  than  one  copy  of  a  particular  fact  -  as  long  as  that  fact  is  never  needed  more 
than  once  by  a  single  rule.  However,  by  simple  transformation,  it  is  possible  to 
eliminate  the  need  for  more  than  one  copy  of  any  persistent  fact. 

For  example,  a  rule  of  form: 

P{x),  P(y), ...  — ■>  Q(x,  y),  P{x),  P{y), . . . 

(with  P  a  persistent  predicate)  can  be  replaced  by  rules  of  form: 

P(x)  — ►  Pi(x),P(x) 

P(x)  — ►  P2(x),P(x) 

P\{x),P2{y),...  — >  Q{x,y),Pi{x),P2{y),... 

(where  Pi  and  P2  are  persistent  predicates). 

Definition  2.2.  A  rule  l  — >  r  in  a  theory  T  is  a  single-persistent  rule  if  all 
predicates  that  are  persistent  in  theory  T  appear  at  most  once  in  l.  A  theory  T  is 
a  uniform  theory  if  all  rules  in  T  are  single-persistent  rules. 

Since  any  theory  can  be  rewritten  as  a  uniform  theory,  we  will  assume  that  all 
theories  discussed  from  this  point  forward  are  uniform  theories. 

Definition  2.3.  Let  P  be  a  set  of  predicates,  each  persistent  in  a  uniform  theory 
T.  Two  states  S  and  S'  are  P- similar  (denoted  S  ~p  S')  if,  after  removing  all 
duplicate  persistent  P  facts  from  each  state,  they  are  equal  multisets. 

Lemma  2.4.  If  S  S'  and  S  — T,  then  3 T'.T  ~p  T'  with  S'  — T' . 

7“ 

Proof  We  construct  the  derivation  S'  — >— >  T'  as  follows:  We  use  the  same  rules  and 
substitutions  as  the  derivation  S  — >— >  T.  This  derivation  is  valid  because  all  rules 
are  single-persistent,  so  any  rules  and  substitution  used  in  the  original  derivation 
will  also  work  in  the  second  derivation  (all  necessary  facts  are  available  to  enable 
the  rules).  □ 
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A  - 

B 

{  ^ .  n(, }  Kh 

B  - 

A 

{na  .  nh}Ka 

A  - 

-4  B 

{nb}  Kb 

Table  1:  Needham-Schroeder  Public  Key  Protocol 


2.6  Equality  and  Disequality 

The  basic  MSR,  framework  defined  in  Section  2.1  can  be  extended  with  tests  for 
disequality  of  terms  using  /  conditions  in  rules.  In  the  extension  MSR which  we 
consider  briefly  in  Section  5,  these  conditions  are  allowed  only  on  the  left  hand  sides 
of  rules,  and  are  not  considered  to  be  facts. 

We  illustrate  this  by  example.  Given  a  rule  of  form 

■  •  -),t2  +U2  — >  Qi{...),Q2{---) 

If  a  state  S  has  facts  P\  and  P2  for  terms  t\,t2,  ■  ■  ■  and  U\,U2,  .  ■  ■  where  £2  is  different 
from  U2,  then  a  possible  next  state  is  S'  with  facts  P\  and  P2  replaced  by  facts  Q\ 
and  Q2. 

To  summarize,  MSR^  is  the  extension  of  MSR  with  these  extended  rules,  keeping 
the  same  signature,  terms  and  facts  as  defined  in  Section  2.1. 

We  do  not  need  to  add  a  condition  to  test  for  equality,  because  it  is  expressible 
by  matching  the  names  of  the  variables  in  the  terms.  For  example,  the  set  of  facts 
{P(a,  b,  c),  Q(a,  d,  e)}  matches  the  left  hand  side  of  the  rule 

P{x,y,z).Q{x,v,w),y  — >  R(. . .),  S(. . .) 

While  the  set  {P(a,  b,  c),  Q(a,  b,  e)}  does  not.  The  rule  requires  that  the  first  two 
arguments  of  the  facts  for  predicates  P  and  Q  be  the  same,  and  the  second  two 
arguments  be  different. 

Computationally,  the  meaning  of  3  in  MSR ^  is  clear  -  each  value  generated  by 
an  3  is  unequal  to  all  others.  We  have  not  investigated  the  correspondence  between 
logic  and  MSR 

3  Multiset  Rewriting  for  Protocol  Theories 

In  this  section  we  will  explain  the  form  of  an  MSR  theory  for  a  class  of  security 
protocols  that  use  Public  Key  encryption.  We  will  make  an  incremental  presen¬ 
tation,  starting  with  some  simple  protocol  roles,  then  introducing  the  Dolev-Yao 
intruder  model  and  our  encryption  model,  and  finally  defining  the  requirements  for 
a  two-phase  intruder  theory.  An  example  of  a  full  theory  for  a  a  public  key  protocol 
is  presented  in  Section  4. 
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Needham-Schroeder  Public  Key  Protocol  We  will  use  the  Needham- Schroeder 
Public  Key  protocol  [NS78]  as  a  running  example  throughout  this  paper.  The  com¬ 
plete  core  protocol,  which  omits  the  steps  that  use  a  trusted  server  to  distribute  the 
public  keys,  is  shown  in  Table  1,  using  a  common  informal  notation. 

In  the  first  step,  the  initiator  A  (commonly  referred  to  as  “Alice”)  sends  a 
message  to  the  responder  B  (commonly  referred  to  as  “Bob” ).  The  message  contains 
Alice’s  name,  and  a  freshly  chosen  nonce,  na  (typically  a  large  random  number).  The 
message  is  encrypted  with  Bob’s  public  key,  which  means  only  somebody  with  Bob’s 
private  key  can  decrypt  it  and  understand  it’s  contents. 

In  the  second  step,  Bob  replies  with  a  nonce  of  his  own,  nj,  along  with  Alice’s 
nonce,  both  encrypted  with  Alice’s  public  key. 

In  the  final  step,  Alice  replies  by  returning  Bob’s  nonce,  encrypted  with  his 
public  key. 

3.1  Protocol  Theories 

During  a  network  transaction  involving  an  implemented  security  protocol,  several 
activities  take  place,  possibly  simultaneously.  These  include  key  generation,  key 
distribution,  and  initiation  of  a  protocol  session  by  a  specific  participant.  These 
activites  can  be  arbitrarily  interleaved.  For  example,  some  public  key  protocol 
sessions  can  take  place  between  Alice  and  Bob,  and  then  later  a  new  participant 
Carol  might  join  them  by  obtaining  a  public  key  certificate  so  that  she  can  also 
converse  with  Alice  and  Bob  in  future  sessions. 

Here  we  introduce  the  notion  of  a  protocol  role  -  specific  steps  of  the  protocol 
meant  to  be  carried  out  by  a  single  principal.  These  are  refered  to  as  local  protocols 
by  Woo  and  Lam  [WL93].  A  typical  protocol  includes  at  least  an  initiator  and 
a  responder  role,  and  often  includes  a  trusted  third  party  or  a  server.  Protocol 
analysis  concerns  the  interaction  of  an  arbitrary  number  of  instances  of  arbitrary 
assignments  of  principals  to  roles,  in  the  presence  of  an  intruder  who  can  replay 
messages  and  parts  of  messages  (i.e.  a  Dolev-Yao  intruder). 

In  our  model  we  separate  the  protocol  execution  into  stages.  There  is  an  implicit 
or  explicit  initialization  phase  that  distributes  keys  or  establishes  other  shared  infor¬ 
mation.  Following  this  initialization  phase,  each  agent  may  choose  to  carry  out  the 
protocol  any  number  of  times,  in  any  combination  of  roles.  For  example  a  principal 
A  may  play  the  role  of  initiator  twice,  and  responder  once,  during  the  course  of  a 
single  attack.  We  incorporate  these  ideas  into  our  formal  definitions  by  letting  a 
protocol  theory  consist  of  an  initialization  theory,  a  role  generation  theory,  and  the 
disjoint  union  of  bounded  subtheories  that  each  characterize  a  possible  role.  We 
identify  the  syntactic  form  of  a  class  of  well-founded  protocol  theories. 

It  is  relatively  straightforward  to  use  the  multiset  rewriting  framework  sum¬ 
marized  in  the  preceding  section  to  describe  finite-state  and  infinite-state  systems. 
Using  function  symbols,  it  is  possible  to  describe  computation  over  unbounded  data 
types.  In  particular,  it  is  easy  to  encode  counter  machines  or  Turing  machines  (as 
we  did  in  Section  2.4),  implying  that  secrecy  is  undecidable.  However,  the  principal 
authentication  and  secrecy  protocols  of  interest  are  all  of  bounded  length,  and  most 
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use  data  of  bounded  complexity  (see  [CJ97]  for  a  relevant  survey).  So,  it  seems 
reasonable  for  our  model  to  represent  protocols  in  a  way  that  reflects  their  bounded 
nature.  Thus  we  assume  that  the  initialization  steps  are  bounded,  and  that  initial¬ 
ization  can  be  completed  prior  to  the  execution  of  the  protocol  steps  proper.  We 
also  formally  define  protocol  role  theories  as  bounded  role  theories. 

Definition  3.1.  A  rule  R  =  l  — >  r  enables  a  rule  l1  — >  r'  if  there  exist  substi¬ 
tutions  a,  a'  such  that  some  fact  P(t)  E  or,  is  also  in  a'l'.  A  theory  T  precedes  a 
theory  <S  if  no  rule  in  S  enables  a  rule  in  T. 

Intuitively,  if  a  theory  T  precedes  a  theory  S,  then  no  facts  that  appear  in  the 
left  hand  side  of  rules  in  T  are  created  by  rules  that  are  in  S. 

Definition  3.2.  A  theory  A  is  a  bounded  role  theory  if  there  is  a  finite  list  of 
predicates  called  the  role  states  and  written  So,  Si, . . . ,  S/t  for  some  k,  such  that  for 
each  rule  l  — >  r  there  is  exactly  one  occurence  of  a  state  predicate  in  l.  say  S, ,  and 
there  is  exactly  one  occurrence  of  a  state  predicate  in  r,  say  Sj.  Furthermore,  it 
must  be  the  case  that  i  <  j.  We  call  the  first  role  state,  So,  an  initial  role  state. 

By  defining  roles  in  this  way,  we  ensure  that  each  application  of  a  rule  in  A 
advances  the  state  forward.  Each  instance  of  a  role  can  only  result  in  a  finite 
number  of  steps  in  the  derivation. 

Definition  3.3.  If  Ai, . . . .  Ak  is  a  set  of  bounded  role  theories,  a  role  generation 
theory  is  a  set  of  rules  of  the  form 

P{s),Q(t),...  — ■>  Si(f),P(s),Q( £),... 

where  P(s),Q(t), . . .  is  a  finite  list  of  persistent  facts  not  involving  any  role  states, 
and  Sj  is  the  initial  role  state  for  one  of  Ai, . . . . ,  Ak- 

Definition  3.4.  A  theory  S  C  T  is  a  bounded  sub-theory  of  T  if  all  formulas  on  the 
right  hand  side  of  the  rules  R  in  S  either  contain  existentials  or  are  persistent  in  T. 

Definition  3.5.  A  theory  V  is  a  well-founded  protocol  theory  if  V  =  XWT^ItlAiW. .  .1+) 
An  where  X  is  a  bounded  sub-theory  (called  the  initialization  theory)  not  involving 
any  role  states,  7 Z  is  a  role  generation  theory  involving  only  facts  created  by  X  and 
the  initial  roles  states  of  Ai,  ■  ■  ■ ,  An,  and  Ai,  ■  ■  ■ ,  An  are  bounded  role  theories,  with 
X  preceding  7Z  and  7Z  preceding  Ai, ... ,  An.  For  role  theories  Ai  and  Aj,  with  i  A  j, 
no  role  state  predicate  that  occurs  in  Ai  can  occur  in  Aj,  and  vice-versa. 

This  form  allows  derivations  in  a  protocol  theory  to  be  broken  down  into  three 
stages  -  the  initialization  stage,  the  role  generation  stage,  and  the  protocol  execution 
stage.  Tables  5  and  6  show  examples  of  these  theories  for  the  Needham-Schroeder 
Protocol. 

Lemma  3.6.  Given  a  well-founded  protocol  theory  V  =  X  l±J  7Z  W  A ,  where  X  is  an 
initialization  theory,  TZ  is  a  role  generation  theory,  and  A  is  the  union  of  one  or 
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more  bounded  role  theories,  if  S  — T  is  a  derivation  over  V,  then  there  exists  a 
derivation  S  — >->■  S' ,  S'  — bA  S"  and  S'1  — bA  T,  where  all  rules  from  X  are  applied 
before  any  rules  from  7Z,  and  all  rules  from,  X  and  7Z  are  applied  before  any  rules 
from,  A. 

Proof.  Since  V  is  a  well-founded  protocol  theory,  we  know  that  X  precedes  7 Z  and 
X  and  1Z  precede  all  of  the  theories  in  A.  Since  no  rules  in  7 Z  can  enable  rules  in 
X,  all  rules  in  X  can  be  applied  before  any  rules  in  1Z.  Similarly,  since  no  rules  in  A 
can  enable  rules  in  X  or  TZ.  all  rules  from  X  and  7Z  can  be  applied  before  any  rules 
from  A.  □ 


3.2  Encryption-Free  Needham-Schroeder 

As  a  means  of  explaining  the  Dolev-Yao  intruder  and  encryption  models  using  our 
notation,  we  begin  with  an  overly  simplified  form  of  the  Needham-Schroeder  proto¬ 
col.  Without  encryption,  the  Needham-Schroeder  protocol  proceeds  as  follows: 


A  - 

A  B 

na 

B  - 

A  A 

riai  f^b 

A  - 

A  B 

rib 

where  na  and  n j  are  fresh  nonces,  chosen  by  Alice  (A)  and  Bob  (B),  respectively. 


Sorts  The  full  Needham-Schroeder  protocol  uses  several  sorts,  but  here  the  data 
is  all  nonces,  so  we  need  only  one  sort,  nonce.  Later  when  we  add  more  sorts,  we 
will  find  it  convenient  to  define  nonce  as  a  subsort  of  msg. 


Predicates  We  can  describe  this  simplified  protocol  in  our  notation  using  the 
predicates  A; ,  Bj,  N,  for  0  <  i  <  3,  with  the  following  intuitive  meaning: 


Ao(  ) 

Ai(nonce) 
A2(nonce,  nonce) 
Bo() 

Bi  (nonce,  nonce) 
B2(nonce,  nonce) 
Ni(nonce) 
N2(nonce,  nonce) 
Ns(nonce) 


Alice  in  state  0  (initial  role  state) 

Alice  in  state  1,  with  her  nonce 
Alice  in  state  2,  with  two  nonces 
Bob  in  state  0  (initial  role  state) 

Bob  in  state  1,  with  two  nonces 
Bob  in  state  2,  with  two  nonces 
Network  has  message  1,  with  indicated  data 
Network  has  message  2,  with  indicated  data 
Network  has  message  3,  with  indicated  data 


The  data  associated  with  the  state  of  some  principal,  or  a  network  message,  will 
depend  on  the  particular  state  or  message.  Each  principal  begins  in  local  state  0, 
with  no  data.  Therefore,  predicates  Ao  and  Bo  are  predicates  with  no  arguments. 
When  Alice  chooses  a  nonce,  she  moves  into  local  state  1.  Therefore,  predicate  Ai  is 
a  unary  predicate  of  type  nonce,  intended  to  be  the  nonce  chosen  by  Alice.  Similarly, 
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BoO, 

Ao()  - 

Ai(na),  Nj(na), 

BoO 

Ai(na), 

B0(),  N;t(na)  - 

Bj(na,  nb),  N2(na,  nb), 

Ai(na) 

>a,  %), 

Ai(na),  N2(na,  nb)  - 

A2(na,  nb),  N3(nb), 

Bi(na,  nb) 

>a,  rib), 

Bi(na,  nb),  N3(nb)  - 

B2(na,nb), 

A2(na,  nb) 

Table  2:  Sample  Trace  of  Encryption-Free  Needham-Schroeder 


predicate  Bi  is  a  binary  predicate  of  type  nonce  x  nonce,  the  data  received  from  Alice 
in  message  one  of  the  protocol  and  the  nonce  chosen  by  Bob  for  his  response. 

The  subscripts  on  the  message  predicates  N,  indicate  which  message  of  the  pro¬ 
tocol  is  being  sent,  which  implicitly  indicates  the  signature  of  the  message.  This 
format  allows  participants  to  distinguish  the  messages  of  a  protocol.  Since  we  will  be 
considering  an  environment  which  includes  an  intruder  (introduced  in  Section  3.3) 
that  can  transform  any  message  from  one  type  to  another,  this  notation  will  not 
limit  the  analysis  in  any  way. 

Rules  Using  these  predicates,  we  can  state  the  protocol  using  four  transition  rules: 

A0()  — 3a:. Ai (a:),  Nj (x) 

BoO,  Ni (x)  — >  3y.B1(x,y),N2{x,y) 

A1{x),N2{x,y)  — >  A  2{x,y),N3(y) 

Bi  (x,y),N3(y)  — >  B2(a:,y) 

Each  rule  corresponds  to  an  action  by  a  principal.  In  the  first  rule,  Alice  chooses 
a  nonce,  sends  it  on  the  network,  and  remembers  the  nonce  by  moving  into  a  local 
state  that  retains  the  nonce  value.  In  the  second  step,  Bob  receives  a  message  on 
the  network,  chooses  his  own  nonce,  transmits  it  and  saves  both  nonces  in  his  local 
state.  In  the  third  step,  Alice  receives  Bob’s  message  and  replies,  while  in  the  fourth 
step  Bob  receives  Alice’s  final  message  and  changes  state. 

If  we  group  the  transition  rules  into  roles, 

A  =  {  A0()  — >  3a;.Ai(a:),Ni(a:),  At  (x),  N2(a:,  y)  — »  A2(a;,  y),  N3(y)  } 

B  =  {  B0(),  Nx (x) — >  By.B1(x,y),N2(x,y),  B1(x,y),N3(y) — >  B2(x,y)  } 

we  see  that  A  and  B  are  bounded  role  theories,  where  Ao  and  Bo  are  initial  role 
state,  and  Ai,  A2,  Bi,  and  B2  are  role  states. 

Sample  Computation  In  Table  2  is  a  sample  trace  generated  from  these  rules, 
beginning  from  state  Ao,  Bo-  Spacing  is  used  to  separate  the  facts  that  participate 
in  each  step  from  those  that  do  not. 
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3.3  Formalizing  the  Intruder 

One  of  the  original  motivations  for  using  multiset  rewriting  for  protocol  analysis  was 
that  this  framework  allows  us  to  use  essentially  the  same  theory  for  all  adversaries 
that  follow  the  Dolev-Yao  model,  for  all  protocols.  The  precise  formulation  of  the 
intruder  will  depend  on  the  message  format  of  the  protocol  being  attacked,  and  on 
the  type  of  encryption  used,  but  the  basic  form  of  the  standard  intruder  will  be  the 
same  under  the  Dolev-Yao  model. 

The  Dolev-Yao  protocol  adversary  or  “intruder”  may  nondeterministically  choose 
among  the  following  actions  at  each  step: 

•  Read  any  message  and  block  further  transmission 

•  Decompose  a  message  into  parts  and  remember  them,  including  decrypting 
any  message  for  which  the  adversary  has  obtained  the  key 

•  Generate  fresh  data  as  needed 

•  Compose  a  new  message  from  known  data  and  send 

By  combining  a  read  with  resend,  we  can  easily  obtain  the  effect  of  passively 
reading  a  message  without  preventing  another  party  from  also  receiving  it. 

There  are  two  main  parts  of  the  Dolev-Yao  model  as  commonly  used  in  protocol 
analysis.  The  first  is  the  set  of  possible  intruder  actions,  applied  nondeterminis¬ 
tically  throughout  execution  of  the  protocol.  The  second  is  a  “black-box”  model 
of  encryption  and  decryption.  We  explain  the  intruder  actions  here,  along  with 
specifying  some  formal  properties  that  are  used  to  bound  the  number  of  intruder 
steps  needed  to  produce  a  given  message.  The  encryption  model  is  presented  in 
Section  3.4. 

Sorts  We  still  have  the  sort  nonce,  but  as  we  will  see  in  Section  3.4,  it  is  convenient 
for  the  intruder  model  to  use  the  sort  msg,  with  nonce  a  subsort  of  msg. 

Functions  We  introduce  a  new  function  for  pairing,  which  is  abbreviated  as 

(_,  _)  :  msg  x  msg  ->  msg 

Predicates  We  introduce  the  basic  predicates  D,  M,  and  C  for  the  intruder,  with 
the  following  intuitive  meaning: 

D(msg)  Decomposable  messages  known  to  intruder 
M(msg)  Information  stored  in  intruder  “memory” 

C(msg)  Composable  messages  known  to  intruder 

Later,  when  we  include  encryption  in  our  model  and  more  sortnames  are  added, 
these  predicates  will  become  more  complex,  eventually  reaching  the  form  shown  in 
Table  7. 
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Rules  In  our  model,  the  intruder  processes  data  in  two  phases.  The  first  stage  is  to 
read  and  decompose  data  into  parts  and  remember  the  parts,  and  the  second  stage 
is  to  compose  a  message  from  the  parts  it  remembers.  We  will  discuss  the  two-phase 
intruder  more  formally  in  Section  3.5.  We  illustrate  the  basic  form  of  the  intruder 
actions  for  an  encryption-free  protocol  using  two  of  the  network-message  predicates 
from  the  previous  example,  Ni (nonce)  and  N2(nonce, nonce).  Using  predicates  D 
for  decomposable  messages  and  M  for  the  intruder  “memory”,  the  basic  rules  for 
intercepting,  decomposing  and  remembering  messages  are 

Ni(rr)  — »  D(rr) 

N2(a:,y)  — »  D  ((#,y)) 

D«®,y))  — >  D(a),D(y) 

D  {z)  —>  M(s) 

The  rules  for  composing  messages  from  parts  are  written  using  the  C,  for  “compos- 
able”,  predicate  as  follows: 

M(a?)  — »  C(x),M(a:) 

C(x)  — »  Ni(a:) 

C(x),C  (y)  — »  C((x,y)) 

C((x,y))  — »  N2(a:,y) 

The  rule  for  generating  new  data  is 

— >  3s.  M  (a:) 

The  reason  we  need  the  last  transition  rule  (which  can  be  applied  any  time  without 
any  hypothesis)  is  that  the  intruder  may  need  to  choose  new  data  in  order  to  trick 
an  honest  participant  in  a  protocol.  This  is  illustrated  in  the  sample  computation 
shown  below. 

Note  that  a  simpler  equivalent  intruder  model  can  be  formulated  by  removing  the 
explicit  composition  and  decomposition  predicates.  Specifically,  if  all  C()  and  D() 
predicates  are  replaced  by  M()  and  redundant  rules  are  removed,  the  above  nine  rules 
can  be  reduced  to  seven  rules.  We  choose  to  model  an  explicit  two-phase  intruder  for 
two  reasons.  First,  the  two-phase  model  is  useful  in  directing  proof  search  techniques 
in  an  implementation  based  on  MSR,  such  as  the  LLF  implementation  mentioned 
in  [CDL+99].  Second,  for  our  complexity  results  we  need  to  be  able  to  prove  poly¬ 
time  decidability  of  the  intruder  actions.  The  proof  in  Lemma  3.13  is  facilitated  by 
the  two-phase  intruder  model,  though  as  we  mention  in  Section  3.5  alternate  proof 
techniques  are  also  available. 

Sample  Computation  An  attack  on  the  encryption-free  (and  obviously  insecure) 
portion  of  the  Needham-Schroeder  protocol  is  shown  in  Table  3.  Here  we  have  the 
actions  of  the  honest  participants  in  the  left  column  and  the  actions  of  the  intruder 
indented.  For  simplicity,  duplicate  copies  of  M(  )  facts  are  not  shown,  since  these 
have  no  effect  on  the  execution  of  the  protocol  or  intruder. 
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Bo(),  A0() 

— >  Ai(na),  Nx(na),  B0() 

— >  Ai(na),  B0(),  D(na) 

— ►  A1(na),B0(),M(na) 

— ►  Ai(na),B0(),M(na),M(n) 

— >  Ai(na),  B0(),  M(na),  M(n),C(n) 

— >  Ai (na),  Ni(ra),  B0(),  M(na),  M(n) 
— >  Bi(n,  nb),  N2(n,  nb),  Ax(na),  M(na), 
— >  Ai(na),  Bi (n,  nb),  M(na),  M(n), 
D((n,  nb)) 

— >  Ax(na),  Bx(n,  nb),  M(na),  M(n), 
D(n),  D(nb) 

— >  Ax(na),  Bx(n,  nb),  M(na),  M(n),  M( 
— >  Ax(na),  Bx (n,  nb),  M(na),  M(n),  M( 
C(na),C(nb) 

— >  Ax(na),  Bx (n,  nb),  M(na),  M(n),  M( 
C({na,  nb}) 

— >  Ax(na),  N2(na,  nb),  Bx(n,  nb), 
M(na),  M(n),  M(nb) 

— >  A2(na,  nb),  N3 ( nb ) ,  Bx(n,  nb), 

M(na),  M (n),  M(nb) 

— >  B2(n,  nb),  A2(na,  nb),  M(na),  M(n),  ^ 


Initial  configuration 
Alice  chooses  nonce  and  sends 
Intruder  intercepts  message  na 
Intruder  learns  value  na 
Intruder  generates  fresh  value  n 
Intruder  begins  composing  message 
Intruder  sends  n  to  Bob 
Bob  receives,  generates  nonce,  replies 
Intruder  intercepts  message  with  n 

Intruder  decomposes  message 

Intruder  learns  value  nb 
Intruder  starts  composing  message 

Intruder  composes  message 

Intruder  sends  message  with  na 

Alice  receives  and  responds 

Bob  changes  to  final  state,  indicating 
successful  completion  of  protocol 


Table  3:  Sample  Attack  on  Encryption-Free  Needham-Schroeder 


In  this  attack,  the  intruder  intercepts  messages  between  A  and  B,  replacing 
data  so  that  the  two  principals  have  a  different  view  of  the  messages  that  have 
been  exchanged.  Specifically,  the  intruder  replaces  Alice’s  nonce  na  by  a  value  n 
chosen  by  the  intruder.  When  Bob  responds  to  the  altered  message,  the  intruder 
intercepts  the  result  and  replaces  n  by  na  so  that  Alice  receives  the  message  she 
expects.  Introducing  encryption  eliminates  this  attack. 

3.4  Modeling  Perfect  Encryption 

The  commonly  used  “black-box”  model  of  encryption  may  be  written  in  our  multiset 
notation  using  the  following  vocabulary.  For  concreteness,  we  discuss  public-key 
encryption.  Symmetric  or  private-key  encryption  can  be  characterized  similarly. 

For  simplicity  we  will  identify  principal  identities  with  their  public  keys. 

Sorts  We  introduce  several  new  sorts,  including  cipher  for  ciphertext,  d_key  for 
decryption  keys  and  e_key  for  encryption  keys.  Since  data  can  be  transformed  by 
encryption  into  a  different  type,  and  the  type  of  an  encrypted  message  can’t  be 
known  until  the  message  is  decrypted,  we  choose  to  introduce  an  order-sorted  algebra 
[Gog78].  This  approach  also  serves  to  keep  the  signatures  of  our  functions  and 
predicates  reasonably  simple. 

We  introduce  msg  as  the  super-sort,  with  the  following  relations:  nonce  <  msg, 
cipher  <  msg,  d_key  <  msg,  e_key  <  msg. 

Predicates  The  predicates  from  the  previous  simpler  example  remain,  though 
with  their  sort  types  modified  appropriately  to  account  for  encryption.  For  example 
the  role  states  must  now  remember  information  about  principals’  public  keys,  and 
the  network  messages  are  now  encrypted.  These  predicates  are  described  in  detail 
in  Section  4. 

We  introduce  new  predicates  related  to  management  of  encryption  keys.  KP(e_key,  d_key ) 
is  used  for  associating  public/private  key  pairs,  and  AnnK(e_key)  indicates  that  a 
public  key  has  been  published.  We  also  introduce  the  predicate  GoodGuy(e_key,  d_key), 
which  is  used  to  identify  the  honest  participants  of  the  protocol,  and  their  keys.  The 
list  of  predicates  for  the  full  theory  is  shown  in  Tables  5,  6,  and  7. 

Functions  In  addition  to  the  pairing  function  from  the  previous  section, 

_)  :  msg  x  msg  >  msg 
we  introduce  a  function  for  encryption, 

enc  :  e_key  x  msg  — >  cipher 

It  is  not  necessary  to  include  a  decryption  function  dec  :  d_key  x  cipher  — >•  msg,  since 
we  write  protocols  using  pattern-matching  (encryption  on  the  left-hand-side  of  a 
rule)  to  express  decryption. 
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Rules  The  core  Needham-Schroeder  protocol  with  encryption  assumes  that  each 
principal  has  a  previously  generated  keypair  with  a  published  public  key.  We  simu¬ 
late  this  in  the  initialization  theory,  with  rules  of  the  following  form. 

— >  3ke.kd.GoodGuy(ke,  kd),  KP (ke,  kd) 

GoodGuy(fce,  kd)  — >  AnnK(fce),  GoodGuy(£;e,  kd) 

The  first  rule  (without  hypothesis)  generates  a  keypair  for  an  honest  principal.  The 
second  rule  announces  the  public  key  so  it  is  accessible  to  other  roles  and  to  the 
intruder. 

New  roles  are  generated  in  the  role  generation  theory,  which  generates  the  initial 
role  states  for  each  instance  of  a  protocol  role.  These  rules  are  of  the  following  form: 

GoodGuy (ke,kd)  — >  GoodGuy(keikd),Ao(ke) 

GoodGuy  (ke,kd)  — >  GoodGuy(ke,kd),Bo(ke) 

Here  any  honest  participant  can  choose  to  participate  as  either  the  initiator  or  the 
responder,  by  generating  the  appropriate  initial  role  state. 

Finally,  the  first  step  of  the  protocol  is  changed  to  include  encryption  and  the 
use  of  the  published  public  keys.  Alice’s  first  step  becomes  the  following: 

AnnK(&g),  A0(&e)  — »  Eh.Ai (ke,k'e,x),  Ni(en c(k'e,  (x,  ke))),  Ar\r\K(k'e) 

Here  Alice  chooses  to  talk  to  somebody  whose  public  key  has  been  announced.  She 
generates  a  nonce  as  before,  and  then  sends  out  the  first  message  encrypted  by  the 
public  key  she  has  selected. 

The  following  transition  rule  then  allows  Bob  to  decrypt  the  message  from  Alice 
and  send  a  reply. 

B0(fce),  Nj  (enc(A:e,  (*,.&'))),  AnnK(A;')  — » 

3y.B\  (ke, k'e,x\  y),  N2(en c{k'e,  {x,y))),AnnK{k'e) 

The  complete  initialization  theory,  role  generation  theory  and  protocol  theories 
for  Needham-Schroeder  are  shown  in  Section  4. 

Intruder  To  model  the  encryption  capabilities  of  the  intruder,  we  add  a  decompo¬ 
sition  and  a  composition  rules  of  the  following  basic  form  to  the  intruder  model.  The 
decomposition  rule  allows  the  intruder  to  decrypt  a  message  (or  part  of  a  message) 
when  the  decryption  key  is  known. 

D(enc(Jfe,.3:)),  KP{k,k'),  M (&') 

— >■  D(®),  KP (&,&'),  M(V),  M(enc(A:, x)) 

The  composition  rule  allows  the  intruder  to  encrypt  a  message  with  any  encryption 
key  known  to  the  intruder. 

M(&),  C(x)  — »  C(enc(&,#)),  M(&) 

A  complete  example  of  the  rules  defining  an  intruder  for  Needham-Schroeder 
(including  more  complex  sorts  and  some  other  changes  explained  in  Section  3.5),  is 
shown  in  Table  7. 
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3.5  Two-Phase  Intruder  Theory 


In  this  subsection,  we  formally  specify  the  properties  of  intruder  theories  that  are 
required  in  order  to  bound  the  number  of  intruder  steps  needed  to  produce  a  given 
message.  We  make  use  of  a  standard  notion  from  proof  theory  based  on  the  normal¬ 
ization  of  proofs  for  natural  deduction.  This  strategy  was  first  applied  to  security 
protocols  by  [CJM98],  who  explain  that  the  actions  of  the  standard  intruder  can  be 
syntactically  separated  into  two  phases  -  a  decomposition  phase  in  which  messages 
are  decomposed  into  smaller  parts,  and  a  composition  phase  in  which  these  parts 
are  (re)assembled  into  a  new  message.  This  two-phase  intruder  provides  us  with  a 
proof  search  strategy  that  is  the  basis  for  the  decidability  of  the  intruder  actions. 

First  we  will  need  to  define  some  new  terms. 

Definition  3.7.  The  size  of  an  atomic  formula  is  the  number  of  symbols  it  contains. 
We  count  one  for  the  predicate  name,  one  for  each  function  name,  and  one  for  each 
variable  or  constant  symbol.  We  introduce  the  notation  |/|  to  indicate  “size  of 
atomic  formula  /”. 

For  example,  \P(x,y)\  =3,  and  \P{f{x,y),z)\  =5. 

Definition  3.8.  A  weighting  function  is  a  function  /  — >  N  that  maps  atomic  formu¬ 
las  to  numeric  weights.  We  introduce  the  notation  W(P(x))  to  indicate  the  “weight 
of  atomic  formula  P(x)%  The  relative  weight  of  formulas  must  be  preserved  under 
substitution,  i.e..  if  A  and  B  are  atomic  formulas  and  a  is  a  substitution,  then 

W(A)  >  W{B)  =>  W(aA)  >  W(aB). 

We  will  use  weighting  functions  to  guarantee  termination. 

Many  weighting  functions  are  possible,  but  we  are  interested  in  a  class  of  func¬ 
tions  that  calculate  the  weight  based  on  a  pair  (■ n,P ),  where  n  is  a  number  in¬ 
dicating  the  size  of  the  atomic  formula,  and  P  is  the  predicate  name.  We  define 
a  strict  (non-reflexive)  partial-ordering  on  the  predicates  of  a  theory.  A  particu¬ 
lar  theory  has  a  particular  ordering.  For  example  in  the  standard  intruder  model, 
D  >  M,  and  C  >  M.  An  example  ordering  of  the  predicates  for  the  intruder  theory  is 
shown  in  Table  7.  The  ordering  is  defined  for  formulas  P{x)  and  P'{x '),  as  follows: 
{\P{x)\,P)  <  (\P'(x')\  ,  P')  if  \P{x)\  <  \P'(x')\  or  \P(x)\  =  \P'(x')\  and  P  <  P' . 

For  example,  if  N,  D,  and  M  are  predicates  with  N  >  D  >  M,  then 


W(D«*,y») 

> 

W(D(a;)) 

W(M((rr,y})) 

> 

W(D(®)) 

W(N(a;)) 

> 

W(D(a:)) 

An  example  of  a  weighting  function  that  conforms  to  these  constraints  is  as 
follows: 

WtiPix))  :=  10  *  \P{x)\  +  val(P) 
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where  val  is  a  function  mapping  predicate  names  to  numbers,  represented  as  a  set 
of  ordered  pairs  as  follows: 

val  :=  {(N,4),(D.3),(M,1),(C,3)} 

Here  the  value  “10”  is  arbitrarily  chosen  to  be  larger  than  any  of  the  values  appearing 
in  the  val  function. 

Definition  3.9.  A  rule  R  =  l  — >  r  is  a  decomposition  rule  with  respect  to  weight¬ 
ing  function  W  if  the  total  weight  of  the  terms  in  r  is  less  than  the  total  weight  of 
the  terms  in  l.  A  rule  R  =  t  — >  r  is  a  composition  rule  with  respect  to  weighting 
function  W  if  the  total  weight  of  the  terms  in  r  is  greater  than  the  total  weight  of 
the  terms  in  l. 

For  example, 

D{(x,y))  — >  D(®),D(y) 

is  a  decomposition  rule  with  respect  to  weighting  function  W\ ,  and 

C(x),C(y)  — >  C((x,y)) 

is  a  composition  rule  with  respect  to  weighting  function  W\. 

For  the  intruder  theories  we  will  consider,  we  allow  persistent  facts  to  appear  in 
both  the  left  and  right  hand  sides.  So,  in  general  a  decomposition  rule  is  of  form: 

D((A,B)),P(. . .)  — »  D(A),D(B),P'(. . .) 

where  P  and  P'  are  sets  of  persistent  predicates,  with  P  C  P'  (and  similarly  for 
composition  rules). 

We  also  need  to  introduce  more  complicated  decomposition  rules,  which  we  call 
“Decomposition  rules  with  Auxiliary  facts” .  These  are  pairs  of  rules  of  form: 

D(t),P(...)— >P'(...),A(i) 

and 

where  P  C  P',  Q  C  Q ',  A  <  D,  and  |t'|  <  |i|.  Here,  A  represents  an  Auxiliary  fact 
(which  can  appear  only  in  a  pair  of  rules  of  this  form)  which  is  used  to  amortize  the 
decomposition  of  D(t)  into  D(t')  across  the  two  rules.  Section  4.4  shows  an  example 
of  this  type  of  decomposition  rule,  used  to  allow  decrypting  an  old  fact  with  a  newly 
learned  encryption  key. 

Definition  3.10.  A  theory  T  is  a  two-phase  theory  if  its  rules  can  be  divided  into 
three  theories  that  share  no  non-persistent  predicates,  T  =  X  l+l  C  t±J  D,  where  X 
is  a  bounded  sub-theory  preceding  C  and  V,  C  contains  only  composition  rules,  V 
contains  only  decomposition  rules,  and  no  rules  in  C  precede  any  rules  in  V. 
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Definition  3.11.  A  normalized  derivation  is  a  derivation  where  all  rules  from  the 
initialization  theory  are  applied  first,  then  all  rules  from  the  decomposition  theory 
are  applied  before  any  rules  from  the  composition  theory. 

It  is  shown  in  [CJM98]  in  a  slightly  different  context  that,  with  the  restriction 
that  keys  must  be  atomic,  all  derivations  in  a  two-phase  theory  can  be  transformed 
into  normalized  derivations. 

Definition  3.12.  A  protocol  theory  is  limited  to  atomic  keys  if  its  signature  does 
not  contain  any  functions  that  return  data  of  type  ekey  or  dkey- 

Lemma  3.13.  Given  a  state  S,  a  two-phase  intruder  theory  M.  with  a  signature 
that  is  limited  to  atomic  keys,  and  a  target  message  X :  it  is  decidable  in  polynomial 
time  whether  the  message  X  is  derivable  from,  the  state  S  using  the  theory  M.. 

Proof.  We  construct  a  polynomial-time  decision  procedure  for  testing  whether  mes¬ 
sage  X  is  derivable  from  state  S  using  the  theory  M.  We  use  the  decomposition 
rules  of  M.  to  decompose  the  state  S  into  a  set  of  base  facts  B.  Then  we  use  the 
composition  rules  of  M.  “backwards”  to  decompose  the  goal  message  X  into  a  set 
of  base  facts  B' .  The  message  X  is  derivable  if  B'  C  B. 

The  algorithm  is  as  follows:  Let  V  be  the  decomposition  theory  of  M.,  and  let 
C  be  the  composition  theory.  We  write  S  —  T  for  multiset  difference  and  S  l±J  T  for 
multiset  union.  If  t  — >  r  is  a  rule  and  a  a  subsitution,  then  ol  and  or  are  multisets. 

1.  Decompose  state  S  into  base  facts  B  using  rules  from  V. 

(a)  So  =  S 

(b)  Repeat  until  no  more  rules  in  V  can  be  applied: 

i.  Find  a  rule  i  — >  r  in  V  with  at  E  Si,  for  some  substitution  o. 

ii.  Si+ 1  =  (Si  —  oi)  l±)  or 

(c)  B  =  Si 

2.  Let  P  be  the  persistent  facts  in  B  (i.e.  for  our  Standard  Intruder,  P  contains 
all  the  M(  )  facts  from  B). 

3.  Decompose  goal  message  X  into  base  facts  B'  using  rules  from  C  and  the 
persistent  facts  from  B. 

(a)  So  = 

(b)  Repeat  until  no  more  rules  in  C  can  be  applied: 

i.  Find  a  rule  i  — >  r  in  C  with  or  E  Si,  for  some  substitution  o. 

ii.  Si+ 1  =  (Si  —  or)  l±)  ot 

(c)  B'  =  Si 

4.  If  B'  C  B  then  ACCEPT  else  REJECT. 
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This  procedure  terminates  because  of  the  properties  of  the  composition  and 
decomposition  rules,  which  guarantee  that  the  total  size  of  the  multiset  gets  smaller 
for  each  application  of  a  decomposition  rule,  and  larger  for  each  application  of  a 
composition  rule  (or  in  this  case  smaller,  since  we  are  applying  them  in  reverse).  In 
this  comparison,  the  size  of  the  multiset  is  the  sum  of  the  sizes  of  the  formulas  it 
contains. 

Note  that  the  order  of  the  choice  of  rule  in  Step  lb  doesn’t  matter.  Each  rule 
selects  a  particular  term  and  decomposes  it,  but  this  doesn’t  effect  other  terms  not 
mentioned  in  that  rule.  So  the  state  B  that  results  in  Step  lc  is  the  same  no  matter 
what  order  the  rules  were  applied.  Similarly  for  Steps  3b  and  3c. 

To  show  correctness  of  the  algorithm,  we  need  to  prove  the  two  directions.  If 
message  X  is  derivable  by  theory  M  from  state  S,  then  the  procedure  ACCEPTS. 
Since  the  message  is  derivable,  that  means  it  can  be  obtained  from  S  using  a  normal¬ 
ized  derivation  that  applies  rules  from  the  decomposition  theory  V  first,  followed  by 
rules  from  the  composition  theory  C.  Let  d  be  the  set  of  decomposition  rules  used 
in  the  derivation,  and  let  d'  be  the  decomposition  rules  used  in  Step  1.  Let  D  be  the 
facts  derived  by  applying  the  rules  in  d.  Since  we  apply  all  possible  decomposition 
rules  in  Step  1,  we  know  that  d  C  d' .  so  D  C  B.  The  derivation  uses  composition 
rules  c  to  construct  fact  X,  possibly  along  with  other  facts,  from  D.  Meanwhile, 
Step  3  decomposes  fact  X  into  its  component  facts,  set  B\  so  B'  C  D.  So  we  have 
D  C  B  N  B'  C  D,  which  means  B'  C  B,  and  the  procedure  ACCEPTS. 

For  the  reverse  direction,  if  the  procedure  ACCEPTS,  then  the  message  X  is 
derivable  by  theory  M.  from  state  S.  If  the  procedure  accepts,  we  can  construct  a 
derivation  that  applies  all  the  rules  from  Step  1,  then  all  the  rules  from  Step  3  in 
the  forward  direction.  Since  S'  C  5,  we  know  that  X  can  be  derived  from  the  facts 
in  B ,  so  a  valid  derivation  of  X  can  be  constructed.  □ 

Note  that  because  Ad  is  a  two-phase  theory,  we  only  need  to  apply  the  above 
procedure  once.  No  term  produced  by  a  rule  in  C  can  appear  on  the  left-hand  side 
of  a  rule  in  D,  so  applying  the  composition  theory  does  not  enable  any  new  rules  to 
be  applied  in  the  decomposition  theory.  Without  the  restriction  to  atomic  keys,  the 
forward  direction  of  our  proof  would  fail,  because  the  application  of  rules  in  C  might 
result  in  the  creation  of  a  new  key  that  would  allow  new  messages  to  be  decrypted 
using  rules  in  V.  Rusinowitch  and  Turuani  present  a  proof  in  a  slightly  different 
setting,  representing  message  terms  as  Directed  Acyclic  Graphs,  which  removes  this 
restriction  to  atomic  keys  [RT01].  McAllester  has  also  shown  a  general  method  for 
proving  the  tractability  of  sets  of  inference  rules  [McA93]. 

3.6  Protocol  and  Intruder 

The  primary  goal  of  security  protocol  analysis  is  to  try  to  find  flaws  in  a  protocol 
-  to  find  attack  scenarios  that  result  in  the  failure  of  properties  such  as  secrecy  or 
authentication,  or  ultimately  to  prove  that  a  protocol  is  correct  (i.e.  that  no  attacks 
are  possible).  In  the  MSR  framework,  we  consider  the  interaction  of  a  well-founded 
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Sorts: 


e_key 

:  encryption  key  (and  principal  name) 

d_key 

:  decryption  key 

cipher 

:  cipher  text  (encrypted) 

nonce 

:  nonces 

msg 

:  data  of  any  type 

Subsorts:  nonce  <  msg. 

cipher  <  msg,  e_key  <  msg,  d_key  <  msg 

Functions: 

enc 

e_key  x  msg  — >  cipher  :  encryption 

{_,  _)  :  msg  x  msg  — >  msg  :  pairing 
Table  4:  Needham-Schroeder  Theory  Signature 


protocol  theory  with  a  two-phase  intruder  intruder  theory,  by  analyzing  standard 
traces  of  the  protocol. 

Definition  3.14.  Given  a  well-founded  protocol  theory  V  =  X  l±J  1Z  l±J  A  and  a  two- 
phase  intruder  theory  M,  a  standard  trace  is  a  derivation  that  has  all  steps  from  the 
I  and  1 Z  first,  then  interleaves  steps  from  the  principal  theories  A  with  normalized 
derivations  from  the  intruder  theory  Jv[. 

The  notion  of  a  standard  trace  is  a  useful  one  for  reasoning  about  the  complexity 
of  security  protocols,  as  we  will  see  in  Section  5. 

The  intruder  is  easily  formalized  as  a  set  of  rewrite  rules.  While  the  basic  intruder 
steps  remain  the  same  from  one  protocol  to  the  next,  the  exact  formalization  depends 
on  the  form  of  messages  used  in  the  protocol.  A  specific  instance  of  the  standard 
intruder  is  described  in  some  detail  in  Section  4.4. 

4  Example:  Needham-Schroeder  Public  Key  Protocol 

In  this  section,  we  present  the  full  theory  of  the  three-step  core  of  the  Needham- 
Schroeder  public- key  protocol,  which  is  shown  in  Table  1.  The  sorts  and  functions 
for  the  signature  are  shown  in  Table  4,  with  the  predicates  introduced  as  needed 
in  the  table  for  each  sub-theory.  An  example  of  a  two-phase  intruder  is  shown  in 
Table  7. 

4.1  Initialization  Theory 

The  Initialization  Theory  X,  for  a  public  key  system  without  a  trusted  server,  is 
shown  in  Table  5.  Here,  the  predicate  GoodGuy  indicates  an  uncompromised  princi¬ 
pal,  parameterized  by  its  encryption  and  decryption  (public  and  private)  keys.  For 
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Predicates: 


GoodGuy(e_key,  cLkey) 
BadKey(e_key,  d_key) 
KP(e_key,  d_key) 
AnnK(e_key) 

Initialization  Theory  X: 

GOODGUY: 

BADKEY: 

ANNK:  GoodGuy  (ke,kd) 

ANNKB:  BadKey  (ke,kd) 


identity  of  an  honest  participant 
keys  of  a  dishonest  participant 
encryption  key  pair 
published  public  key 


3/ce./cd.GoodGuy(/ffe,  kd),  KP (ke,  kd) 
3fceX-d.BadKey(A:e,  kd),  KP (ke,kd) 
AnnK(&e),  GoodGuy(/ce,  kd) 
AnnK(£;e),  BadKey(/ce,  kd) 


Table  5:  Public  Key  Initialization  Theory 


simplicity,  we  identify  the  principal  with  its  public  key  (be.  where  “A”  appears 
in  the  protocol,  we  use  the  public  key  “Ka”).  The  GOODGUY  rule  allows  for  the 
creation  of  an  unlimited  number  of  principals,  each  with  a  unique  key  pair,  denoted 
by  the  predicate  KP. 

The  BADKEY  rule  provides  a  mechanism  for  specifying  an  unlimited  number  of 
compromised  key  pairs,  which  appear  to  belong  to  valid  principals,  but  whose  private 
keys  are  known  to  the  intruder.  The  predicate  BadKey  denotes  these  compromised 
key  pairs.  There  is  no  need  to  distinguish  between  the  case  of  an  honest  participant 
who  does  follow  the  protocol  but  has  had  his  keys  compromised,  and  the  case  where 
the  intruder  himself  is  simply  posing  as  an  honest  participant,  but  is  not  constrained 
to  follow  the  protocol.  This  is  because  the  intruder  can  simply  simulate  the  first 
case,  by  performing  the  steps  of  the  protocol,  if  he  wants.  I.e.  there  is  no  need  to 
have  both  GoodGuy(£;,  k')  and  BadKey (k,k')  facts  generated  for  the  same  keys. 

We  accomplish  key  distribution  by  having  the  principals  announce  their  public 
keys.  The  ANNK  rule  accomplishes  this  for  the  GoodGuy  participants,  while  the 
ANNKB  rule  does  the  same  for  the  BadKey  pairs.  Note  that  both  rules  generate  a 
predicate  AnnK  indicating  a  public  key  that  is  available  for  communication,  so  from 
this  point  the  honest  participants  can  not  distinguish  the  good  guys  from  the  bad 
guys. 

Note  that  in  order  for  the  Initialization  theory  to  be  a  bounded-sub  theory,  as 
defined  in  Definition  3.4,  all  the  predicates  that  appear  on  the  left-hand  side  of  the 
rules  (i.e.  GoodGuy,  KP,  BadKey  and  AnnK)  must  be  persistent  in  the  protocol 
theory.  This  can  be  verified  by  examining  how  these  predicates  are  used  in  each  of 
the  rules  of  the  Role  Generation  and  Role  theories  in  Table  6. 
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4.2  Role  Generation  Theory 

The  Role  Generation  Theory  1Z  is  shown  in  Table  6.  Rules  ROLA  and  ROLB  allow 
an  unlimited  number  of  sessions  to  be  started  for  any  principal  to  act  in  the  role 
of  either  “Alice”  (the  initiator)  or  “Bob”  (the  responder).  Ao  and  Bo  denote  the 
initial  role  states  for  the  A  and  B  roles,  respectively,  parameterized  by  the  public 
key  (principal)  acting  in  that  role. 

Note  that  this  theory  satisfies  Definition  3.3,  since  the  rules  involve  only  the 
persistant  predicate  GoodGuy,  and  the  initial  role  states  Ao  and  Bo,  which  appear 
only  on  the  right  hand  side  of  the  rules. 

4.3  Protocol  Role  Theories 

The  Role  Theories,  shown  in  Table  6,  are  derived  directly  from  the  specification  of 
the  Needham-Schroeder  protocol.  Theory  A  corresponds  to  the  role  of  “Alice”,  and 
theory  B  corresponds  to  “Bob” . 

In  rule  Al,  which  corresponds  to  the  first  line  of  the  protocol,  a  principal  ke, 
in  its  initial  state  Ao,  decides  to  talk  to  another  principal  k'e,  whose  key  has  been 
announced.  A  new  nonce  x  is  generated,  along  with  a  network  message  N$i  cor¬ 
responding  to  the  first  message  sent  in  the  protocol,  and  the  principal  moves  to 
the  new  state  Al,  remembering  the  values  of  x  and  k!e.  Note  that  since  AnnK  is 
persistent,  it  must  also  appear  on  the  right  hand  side  of  the  rule. 

In  step  Bl,  corresponding  to  the  second  step  of  the  protocol,  a  principal  ke,  in 
the  initial  state  Bo,  responds  to  a  message  on  the  network  which  is  of  the  expected 
format  (i.e.  encrypted  with  ke' s  public  key,  and  with  the  identity  of  a  participant 
whose  key  has  been  announced,  embedded  inside).  ke  generates  another  nonce,  and 
replies  to  the  message,  moving  to  a  new  state  B\  where  all  the  information  (the  two 
nonces  and  the  two  principals)  is  remembered. 

Similarly,  A2  corresponds  to  the  third  line  of  the  protocol,  and  B2  corresponds 
to  the  implicit  step  where  the  responder  actually  receives  the  final  message. 

Note  that  sent  messages  are  denoted  by  Nsi  and  received  messages  are  denoted 
by  a  corresponding  predicate  Nr,.  The  minimal  intruder  theory  can  be  thought  of 
as  providing  a  network  that  transforms  Nsi’s  to  Nri’s,  so  the  protocol  can  execute. 
There  are  several  ways  to  encode  protocol  theories  using  our  MSR  formalism.  For  ex¬ 
ample  an  alternate  encoding  could  use  a  single  N  predicate  for  all  network  messages. 
In  the  presence  of  an  intruder,  these  alternate  encodings  are  all  logically  equivalent, 
because  the  intruder  can  transform  from  one  network  predicate  to  another,  so  we 
have  chosen  the  one  that  seems  most  convenient  for  our  purposes. 

4.4  Intruder  Theory 

The  Intruder  Theory,  which  is  an  example  of  a  Standard  Two-Phase  Intruder,  is 
shown  in  Table  7.  Here  the  M*  predicates  denotes  persistent  facts  known  to  the 
intruder,  while  D,  A  and  C  represent  non-persistent  facts  which  can  be  decomposed 
and  composed  into  other  facts. 


29 


Predicates: 

Ao(e_key)  :  Role  state  0  for  initiator 

Ai  (e_key,  e_key,  nonce)  :  Role  state  1  for  initiator 

A2(e_key,  e_key,  nonce,  nonce)  :  Role  state  2  for  initiator 

Bo(e_key)  :  Role  state  0  for  responder 

Bi  (e_key,  e_key,  nonce,  nonce)  :  Role  state  1  for  responder 

B2(e_key,  e_key,  nonce,  nonce)  :  Role  state  2  for  responder 


Nsi(cipher)  :  (i  =  1,2,3)  encrypted  message  (sent) 

NRi(cipher)  :  (i  =  1,2,3)  encrypted  message  (received) 

Role  Generation  Theory  7 Z: 

ROLA:  GoodGuy (ke,kd)  — >  GoodGuy(&e,  &</),  Ao(A;e) 

ROLB:  GoodGuy(&e,  kj)  — >  GoodGuy(&e,  kj),  Bo(&e) 

Protocol  Theories  A  and  B: 

Al:  AnnK(fc' ),  Ao(A:e)  — » 

Bx.A1{ke,k’e,x),  NSi(en c(k'e,  {x,  ke))),  AnnK(ft' ) 
A2:  Ai(ke,  k'e,x),  NR2(enc(/ce,  (x,y)))  — >  A2(A:e,  k'el  x,  y),  NS3(enc(/4,  y)) 

Bl:  B0(&e),  Nri  (enc(A;e,  (x,k'e))),  AnnK(^g)  — > 

By.BAke,  k'elx,y),  NS2(enc {k'e,  (x,  y))),  AnnK(&' ) 
B2:  B1{ke,k'e,x,y),NR3{enc{ke,y))  — >  B 2{ke,  k'e,  x,y) 


Table  6:  Needham-Schroeder  Theory 


LRNKB  and  LRNK  are  initialization  rules  that  allows  the  intruder  to  learn 
keys.  Since  the  BadKey  and  AnnK  predicates  are  generated  only  by  the  initialization 
theory,  we  know  from  Lemma  2.4  that  this  rule  only  needs  to  be  applied  once  per 
derivation,  per  BadKey  and  AnnK  fact. 

The  REC  and  SND  rules  are  used  to  connect  the  intruder  to  the  network  being 
used  by  the  participants.  The  REC  rule  intercepts  a  message  from  the  network 
and  saves  it  as  a  decomposable  fact.  The  SND  rule  sends  composed  facts  onto  the 
network. 

The  COMP  rule  allow  the  user  to  compose  small  terms  into  larger  ones,  while 
the  DCMP  rule  allows  for  decomposition  of  large  terms  into  smaller  ones.. 

LRN  converts  a  decomposable  fact  into  intruder  knowledge,  and  USE  converts 
intruder  knowledge  into  a  composable  fact. 

The  ENC  and  DEC  rules  allow  the  intruder  to  decrypt  a  message  if  it  knows  the 
private  key,  and  to  generate  encrypted  message  from  known  public  keys. 

Note  that  LRNA  and  DECA  are  decomposition  rules  with  auxiliary  facts  that 
handle  a  special  case  for  encrypted  messages.  If  the  message  can’t  be  decrypted 
because  the  key  isn’t  currently  known,  LRNA  remembers  the  decrypted  message 
with  the  special  “Auxiliary”  predicate,  A.  The  DECA  rule  allows  Auxiliary  messages 
to  be  decrypted  at  a  later  time,  if  the  decryption  key  becomes  known. 

Finally,  GEN  allows  the  intruder  to  generate  new  facts  (i.e.  nonces)  as  needed. 
We  could  also  include  rules  to  generate  other  new  data  types,  such  as  dynamic  keys, 
but  we  omit  those  here  because  they  are  not  relevant  to  our  Needham-Schroeder 
example. 

We  expand  the  weighting  function  described  in  Section  3.5  to  include  the  pred¬ 
icates  used  here,  i.e. 

W2{P{x))  :=  10  *  \P(x)\  +  val(P) 

where 

val  :=  {(NSi,4),  (D,  3),  (A,  2),  (M,,l).  (NRi,4),  (C,  3)} 

This  Intruder  Theory  can  be  divided  into  Composition  and  Decomposition  rules, 
as  shown  in  Table  7.  So,  this  is  a  Two-Phase  Intruder  Theory  with  respect  to 
weighting  function  W2,  as  described  in  Section  3.1. 

5  Complexity  Results 

In  this  section,  we  investigate  the  complexity  of  secrecy  for  bounded  protocols  of 
a  restricted  form.  More  specifically,  we  give  upper  bounds  that  are  as  general  as 
possible,  and  lower  bounds  that  apply  to  as  restricted  a  subset  of  the  protocol  as 
possible.  In  general,  a  secrecy  specification  stipulates  that  certain  “secret”  data 
must  not  fall  into  the  hands  of  the  intruder.  This  is  a  derivability  or  reachability 
problem  in  our  framework:  given  an  initial  secret  such  as  Ao(S),  indicating  that  a 
secret  S  is  known  to  principal  A  at  the  beginning  of  a  protocol  run,  is  there  a  run 
of  the  protocol  and  intruder  in  which  the  adversary  learns  S,  i.e.  M(5)  appears  in 
the  state  of  the  system? 
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Variables:  x  :  msg,  y  :  msg,  n  :  nonce,  c  :  cipher,  ke  :  e_key,  kd  :  cLkey 
Predicates: 


Mek(e-key) 

Mdk(d_key) 

Mn(nonce) 

Mc(cipher) 

D(msg) 

C(msg) 

A(cipher) 


fact  in  intruder  memory  (encryption  key) 
fact  in  intruder  memory  (decryption  key) 
fact  in  intruder  memory  (nonce) 
fact  in  intruder  memory  (ciphertext) 
decomposable  fact 
composable  fact 

auxiliary  fact  (for  deferred  decryption) 


Ordering  of  Predicates:  Nsi  >  D  >  A  >  M*  and  Nr]  >  C  >  M* 
Weighting  Function:  W2 


Initialization  Rules: 

LRNKB:  BadKey  (ke,kd) 
LRNK:  AnnK  (ke) 


Mek(&e),  Mdk(^),  BadKey(fce,  kd) 
Mek(^e),  AnnK(&e) 


I/O  Rules: 

REC:  NSi  (as) 

SND:  C{x) 


D(®) 

NRi(^) 


Decomposition  Rules: 

DCMP:  D  ((x,y)) 

LRNEK:  D(Jfce) 

LRNDK:  D  (kd) 

LRNN:  D(n) 

DEC:  Mdk(kd),  KP(ke,kd),  D(en c{ke,x)) 

LRNA:  D(enc(&e,  x)) 

DECA:  Mdk (kd),  KP(ke,  kd),  A(enc(ke,  x)) 


D(*),D(y) 

Mek(^e) 

Mdk  (kd) 

M„(n) 

Mdk{kd),  KP(/ce,  kd).  D(rr),  Mc(enc(A:e,  x)) 
Mc(enc(&e,  x)),  A(en c(ke,x)) 
Mdk(kd),KP(ke,kd),D(x) 


Composition  Rules: 


COMP: 

C(x),C(y) 

USEEK: 

Mek(^e) 

USEDK: 

Mdk  (kd) 

USEN: 

Mn(ra) 

USEC: 

Mc  (c) 

ENC: 

GEN: 

Mek(^e),  C(x 

C  ((x,y)) 

C(ke),  M ek(ke) 

C (kd),  Mdk(kd) 

C(n),  Mn(n) 

C(c),  Mc(c) 

C(enc(A;e, x)),  Mek(ke) 
Bn.Mn(n) 


Table  7:  Two-Phase  Intruder  Theory 
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5.1  Restricted  Protocol  Form 

The  purpose  of  the  initialization  theory,  in  our  framework,  is  to  formalize  the  choice 
of  initial  conditions,  such  as  shared  public  or  private  keys.  However,  as  we  have 
defined  protocol  theories,  there  are  few  restrictions  on  the  form  of  initialization 
theories.  Since  derivability  in  multiset  rewriting  is  undecidable,  we  can  prove  trivial 
lower  bounds  by  encoding  complex  problems  into  the  initialization  theory.  However, 
this  kind  of  lower  bound  would  not  shed  any  light  on  protocol  analysis.  In  order  to 
avoid  this  essentially  degenerate  case,  we  will  analyze  decidability  and  complexity  for 
protocol  theories  with  initialization  theories  that  consist  only  of  a  finite  set  of  ground 
facts,  and  no  rewrite  rules.  Intuitively,  this  means  that  we  analyze  decidability 
and  complexity  of  the  role  generation  and  protocol  execution  phases,  under  the 
assumption  that  initialization  has  already  been  completed. 

In  addition,  there  are  undecidability  results  for  models  that  allow  either  an 
unbounded  number  of  tuplings  [HT96],  or  unbounded  nesting  of  encryption  and 
decryption  [EG83].  So  we  will  consider  derivations  that  limit  both  the  length  of 
messages  and  the  depth  of  encryption,  by  bounding  the  size  of  the  ground  facts  that 
can  appear  in  a  derivation. 

We  also  restrict  the  form  of  the  protocol  roles.  In  our  previous  examples,  a  step 
of  a  protocol  role  A  has  the  form 

A,(. ..),  NRj (...),  P (...),  Q(. ..),.. . 

— »  3 - Ak(. . .),  Ns^(. ..),  P (...),  Q . 

where  Aj(. . .)  and  A|<(. . .)  are  role  states,  Nrj(.  . .)  and  Nsf(. . .)  are  network  messages, 
and  P(. ..),  Q .  are  persistent  facts  appearing  on  the  left  and  right  of  the 
rule.  However,  for  the  purpose  of  proving  a  stronger  negative  result,  we  restrict 
our  attention  to  a  simpler  form  of  protocol  step  in  this  section,  by  omitting  the 
persistent  facts. 

With  these  restrictions  in  mind,  we  come  to  the  following  definitions: 
Definition  5.1.  A  role  V  of  agent  A  is  a  restricted  role  if 

•  Its  role  states  are  drawn  from  a  finite  list  of  predicates,  Ai, . . . ,  Aa. 

•  The  network  predicates  Nrj  and  are  drawn  from  a  finite  list  of  predicates 
Nri,  •  •  ■ ,  NRn  and  NSi, . . . ,  NSn- 

•  It  contains  only  rules  of  form 

A, (...),  NRj (.. .)  — >■  3  . . .  .Ak(. . .),  NSf  (. . .) 

where  A,  and  A|<  are  role  states,  and  Nrj  and  are  network  predicates,  with 
i  <  k  <  a  and  j  <  l  <  n,  in  each  rule. 

Definition  5.2.  A  protocol  theory  T  =  1 1±)  1Z  l±)  A  is  in  restricted  form,  if 

•  The  initialization  theory  I  is  a  set  of  ground  facts. 

•  72.  is  a  role  generation  theory 

•  A  is  a  finite  set  of  roles,  each  a  restricted  role. 
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5.2  Secrecy  Decision  Problem 

Even  with  a  protocol  in  restricted  form,  there  are  several  interesting  cases  to  con¬ 
sider,  depending  on  whether  where  the  number  of  existentials  and  roles  is  bounded, 
and  on  whether  the  derivation  bound  on  the  term  size  is  fixed  or  varying. 

We  define  a  set  of  protocol  scenarios  as  follows: 


Va  =  {{ T ,  M,  S,  n,  r,  k)  |  For  ground  term  S  (’’the  Secret”), 

there  exists  a  run  of  protocol  theory  T  with  standard 
intruder  M.  leads  to  a  state  containing  M(5),  such 
that  at  most  n  protocol  nonces,  at  most  r  role  in¬ 
stances,  and  facts  of  size  at  most  k  appear  in  the 
run.} 

Intuitively,  TUis  the  set  of  protocol  scenarios  that  contain  an  attack.  Deciding 
membership  in  Pqis  equivalent  to  deciding  if  an  attack  on  a  protocol  exists.  There 
are  a  variety  of  secrecy  decision  problems  that  can  be  defined,  depending  on  how 
the  parameters  of  the  set  are  specified.  The  four  cases  we  consider  here  are: 


For  each  natural  number  k,  Ssize=k :  Given  T,M.,S  decide  if  there  exists  n,r 
such  that  (T,  M,  S,n,  r,  k)  G  Va- 

Snonceb  •  Given  T,  M,  S,  n.  k  decide  if  there  exists  r  such  that  (T,  A4,  S',  n ,  r,  k)  G  Va- 

Sroieb •  Given  T,  M,  S,  r,  k  decide  if  there  exists  n  such  that  (T,  A4,  S,  n,  r,  k)  G  Va- 

For  each  natural  number  k,  Snoncef)%size=k:  Given  T,  n  decide  if  there  ex¬ 

ists  r  such  that  (T,  A4,  S,  n,  r,  k)  G  Va- 

We  are  now  ready  to  present  the  main  results  for  this  section: 


Theorem  1. 
Theorem  2. 
Theorem  3. 
Theorem  4. 


Ssize=k'is  undecidable  for  every  k  greater  than  some  small  value. 
Snonceb  with  no  disequality  test  is  dexp -complete. 

Sroieb  is  NP-complete. 

S nonceb ,size=k  with  no  disequality  test  is  in  P. 


Proof.  Theorem  1  follows  from  the  upper  bound  (Proposition  5.1)  in  Section  5.4.1 
and  the  lower  bound  (Proposition  5.5)  in  Section  5.5.2. 

Theorem  2  follows  from  the  upper  bound  (Proposition  5.2)  in  Section  5.4.2  and 
the  lower  bound  (Proposition  5.6)  in  Section  5.5.3. 

Theorem  3  follows  from  the  upper  bound  (Proposition  5.3)  in  Section  5.4.3  and 
the  lower  bound  (Proposition  5.7)  in  Section  5.5.4. 

The  proof  of  Theorem  4  is  in  Section  5.4.4.  □ 


Open  Problem:  The  series  of  ???  in  the  box  at  the  top  of  column  two  indicates 
an  unresolved  question  for  SnonCeb ,  when  disequality  tests  are  allowed. 
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Bounded  #  Roles 

Unbounded  #  Roles 

bounded  3 

Unbounded  3 

term  size  fixed 

in  all  instances 

p 

p 

(Thm  4) 

Undec. 

(Thm  1) 

term  size  varies 
per  instance 

NPC 

(Thm  3*) 

DEXPC 
(Thm  2) 

Undec. 

Table  8:  Protocol  Theory  Complexity  Overview 


5.3  Protocol  Complexity  Matrix 

Table  8  shows  a  summary  of  the  complexity  results  for  the  main  theorems  presented 
in  this  paper.  The  two  main  columns  consider  the  case  of  whether  the  number  of 
roles  is  bounded  or  unbounded.  This  refers  to  the  number  of  instances  of  each  role 
( i.e .  the  number  of  protocol  sessions)  that  are  allowed  to  participate  in  a  protocol 
run.  In  the  left  column  the  role  generation  theory  7 Z  is  bounded,  meaning  we  fix 
in  advance  the  maximum  number  r  of  rules  from  1Z  that  can  be  used  to  generate 
a  protocol  role  instance,  and  then  consider  all  runs  with  r  or  fewer  roles.  In  the 
right  two  sub-columns,  1Z  is  not  bounded,  meaning  runs  with  an  arbitrary  number 
of  roles  need  to  be  considered. 

The  second  column  is  further  sub-divided  according  to  whether  the  number  of 
existentials  instantiated  during  execution  in  these  roles  is  bounded  or  not.  If  the 
number  of  existentials  is  bounded,  then  we  fix  in  advance  the  maximum  number  n  of 
protocol  nonces,  and  consider  all  runs  with  n  or  fewer  nonces.  Because  the  number 
is  fixed,  the  nonces  can  be  assumed  to  have  been  produced  during  initialization,  and 
not  within  the  roles  themselves. 

The  two  rows  of  Table  8  consider  whether  the  term  size  k  is  fixed  in  all  instances 
of  the  problem,  or  whether  the  term  size  is  allowed  to  vary  as  a  parameter  of  the 
problem. 

For  each  entry  of  the  matrix  in  Table  8,  we  show  the  complexity  result  for 
that  case,  using  “p”  to  indicate  the  problem  is  in  polynomial  time,  “npc”  for  NP- 
complete,  “dexpc”  for  DEXP-complete,  and  “Undec.”  for  Undecidable.  The  entries 
also  indicate  which  theorem  is  applicable  in  each  case. 

Table  9  is  a  more  detailed  summary  of  the  complexity  results,  where  we  show 
more  detail  about  the  results  for  the  upper  and  lower  bounds.  The  columns  are 
the  same  as  in  Table  8,  but  now  the  two  main  rows  consider  whether  the  intruder 
is  allowed  to  generate  fresh  values  or  not.  These  rows  are  further  subdivided  into 
the  cases  where  the  roles  can  perform  disequality  tests  which  would  allow  them  to 
determine  whether  two  fresh  values  are  different  from  each  other.  The  ^  row  allows 
both  equality  and  disequality  tests,  while  the  =  row  allows  only  equality  tests.  In 
a  protocol,  a  test  for  disequality  on  a  nonce  would  mean  the  protocol  compares  a 
supposedly  fresh  nonce  it  receives  against  all  the  other  nonces  it  has  received,  to 

*A  stronger  result  with  no  limit  on  term  size  is  in  [RT01,  ALV02] 
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Bounded  #  Roles 

Unbounded  #  Roles 

bounded  3 

Unbounded  3 

I  with  3 

+ 

(5.3*)  npc 

??? 

(5.1)  Undec. 

= 

NPC 

(5.2)  DEXPC 

Undec. 

I  no  3 

NPC 

DEXPC 

Undec. 

= 

(5.7)  NPC 

(5.6)  DEXPC 

(5.5)  Undec. 

Table  9:  Protocol  Theory  Complexity  Matrix 


make  sure  it  is  actually  fresh.  If  disequality  is  not  allowed,  then  this  test  is  not 
performed. 

Table  9  shows  the  complexity  results  for  these  cases,  using  the  same  notation  as 
for  Table  8.  The  numeric  references  indicate  the  propositions  about  specific  lower 
or  upper  bounds  which  we  discuss  in  the  following  sections.  With  the  exception  of 
the  top  case  of  column  two  (bounded  roles  with  existentials  and  disequality  test, 
and  intruder  with  unbounded  existentials),  we  will  see  that  the  lower  bounds  apply 
to  all  cases  above  them  in  the  table,  and  the  upper  bounds  apply  to  all  cases  below 
them. 

5.4  Upper  Bounds 

5.4.1  Reachability  for  protocols  is  r.e.  (Thm  1) 

Proposition  5.1.  Ssize=kis  recursively  enumerable. 

Proof.  This  is  immediate  because  we  can  enumerate  all  the  execution  sequences, 
i.e.  all  the  computations  of  the  protocol.  □ 

5.4.2  Snonceb without  disequality  is  in  dexp  (Thm  2) 

Proposition  5.2.  Snoncet> without  disequality  tests  is  in  dexp. 

Proof.  We  prove  that  5„once& without  disequality  tests  has  a  deterministic  expo¬ 
nential  time  decision  procedure.  Given  T,  M.,  S,n,  k,  the  algorithm  runs  in  time 
0((|T|  +  \M\  +  n)k),  where  |T|  and  \M.\  are  the  sizes  of  the  protocol  role  theories 
and  the  intruder  theory,  respectively. 

We  restrict  the  protocol  theory  by  placing  a  bound  on  the  number  of  existentials 
that  can  be  generated  during  protocol  execution.  First,  we  consider  the  case  without 
disequality  tests,  where  the  intruder  cannot  generate  any  existentials,  the  number  of 
roles  is  unbounded,  but  the  number  of  existentials  generated  by  the  protocol  theory 
is  bounded.  This  corresponds  to  the  bottom  box  in  the  second  column  of  Table  9. 
Then  we  show  that  the  upper  bound  also  holds  for  the  two  boxes  above  this  one 
in  the  table,  when  we  introduce  the  disequality  test,  and  when  the  intruder  can 
generate  existentials  (but  with  no  disequality  test). 

First  we  observe  that  bounding  the  number  of  protocol  existentials  means  that 
we  can  generate  all  the  existentials  used  by  all  runs  of  the  protocol  during  the 
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initialization  phase.  No  new  data  can  be  generated  during  protocol  execution,  and 
k  gives  a  limit  on  the  size  of  any  ground  fact  that  can  appear  in  a  run.  Therefore 
we  have  a  fixed  alphabet  S  of  size  0(\T\  +  \M\  +  n),  and  there  is  an  exponential 
|£|fc  bound  on  the  number  of  distinct  ground  facts  that  may  appear  in  any  possible 
protocol  execution. 

Next  we  observe  that  because  role-generation  is  unlimited  and  no  role  creates 
new  data,  each  role  can  be  re-run  as  many  times  as  desired.  More  specifically,  if 
X  is  the  initialization  theory,  and  fact  P(t)  is  in  a  state  S  derivable  from  X  by  the 
role  generation  and  protocol  rules,  then  there  is  another  state  S'  D  S  containing  an 
additional  occurrence  of  P(t)  that  can  also  be  derived  from  X  by  the  role  generation 
and  protocol  rules.  Specifically  this  means  that  our  algorithm  can  freely  apply  any 
rule  from  any  role,  as  many  times  as  needed,  because  it  is  always  possible  to  apply 
the  earlier  rules  from  that  role  in  order  to  create  the  required  role  state  for  the  rule 
we  want  to  apply. 

In  short,  our  algorithm  can  treat  all  facts  as  if  they  were  persistent  facts,  since 
role-states  can  always  be  regenerated,  and  network  messages  can  always  be  replayed 
by  the  intruder. 

Therefore,  we  can  decide  whether  a  fact  M(5)  is  derivable  by  the  following 
decision  procedure: 

1.  set  F  :=  a  set  containing  the  ground  facts  from  X 

2.  set  R  :=  a  set  containing  all  ground  instances  of  the  rules  from  T  +  M. 

3.  repeat: 

(a)  Select  a  rule  l  — >  r  from  R. 

(b)  Apply  the  rule  if  f  C  F. 

(c)  N  :=  r,  i.e.  N  :=  the  facts  generated  by  applying  the  rule  to  F. 

(d)  F  :=F  +  N 

4.  until  F  +  N  =  F  for  all  rules  in  R. 

5.  if  fact  M(5)  €  F ,  return  YES,  else  return  NO. 

Since  there  is  an  exponential  bound  on  the  number  of  bounded-length  facts 
that  can  be  written  over  the  signature,  this  process  will  terminate  in  exponential 
time.  This  decision  procedure  resembles  the  dexp  upper  bound  for  Datalog,  once 
we  observe  that  all  role  steps  can  be  repeated  as  many  times  as  needed  [DEGY97, 
Imm86,  Yar82], 

In  the  case  where  the  protocol  roles  can  test  for  disequality,  as  long  as  no  new 
nonces  can  be  produced  during  execution  (i.e.  the  intruder  can’t  create  any  nonces), 
the  disequality  tests  just  further  restrict  which  rules  are  applicable  in  a  state.  This 
might  decrease  the  number  of  reachable  facts,  but  the  above  algorithm  still  works, 
and  it  does  not  affect  the  upper  bound. 

Similarly,  although  we  have  presented  this  upper  bound  in  terms  of  a  protocol  in 
restricted  form,  in  fact  the  upper  bound  still  hold  for  protocols  whose  roles  contain 
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persistent  facts,  i.e.  it  holds  for  roles  that  are  not  in  restricted  form.  This  is  because 
the  presence  of  additional  persistent  predicates  in  the  LHS  of  rules  just  further 
restricts  which  rules  are  applicable  in  a  state.  As  with  disequality  tests,  this  might 
decrease  the  number  of  reachable  facts,  but  does  not  affect  the  upper  bound.  The 
above  decision  procedure  would  still  work. 

The  above  argument  assumes  that  the  number  of  intruder  existentials  is  bounded. 
However,  in  the  case  where  the  roles  cannot  test  for  disequality,  any  attack  with  more 
than  one  nonce  provided  by  the  intruder  can  be  reduced  to  an  attack  with  only  one 
new  nonce  provided  by  the  intruder.  Therefore,  in  the  absence  of  a  disequality  test, 
the  intruder  with  one  new  nonce  is  equivalent  to  the  intruder  with  unlimited  new 
nonces.  So  this  case  is  also  in  dexp.  □ 

5.4.3  Bounded  Roles  is  in  np  (Thm  3) 

In  the  following  section  we  prove  our  result  for  the  case  of  theories  with  bounded 
term  size  whose  signature  uses  only  atomic  keys,  i.e.  theories  in  which  the  signature 
cannot  contain  any  functions  that  return  messages  of  type  ekey  or  dkey-  Since  our 
original  result  (which  is  unpublished  until  now,  but  was  mentioned  in  the  presenta¬ 
tion  of  [DLMS99])  several  stronger  results  have  been  reported.  Both  [RT01,  ALV02] 
show  that  the  problem  with  bounded  roles  and  with  unbounded  message  size  is  in 
np.  In  addition  [RT01]  extends  this  result  to  non-atomic  keys,  and  the  [ALV02] 
result  includes  disequality  tests.. 

Proposition  5.3.  Sroieb,  with  the  signature  for  T  and  M.  restricted  to  non-atom, ic 
keys ,  is  in  np. 

Proof.  We  prove  that  Sroleb has  a  non-deterministic  polynomial-time  decision  proce¬ 
dure. 

Recall  from  Definition  5.2  that  T  =  X  tfcl  7Z  l±J  A.  We  restrict  the  protocol  theory 
by  placing  a  bound  on  the  number  of  role  instances  that  can  be  generated  by  the 
role  generation  theory  1Z,  i.e.  we  place  a  limit  r  on  the  number  of  initial  role  states 
that  can  appear  in  any  run,  by  limiting  how  many  times  the  rules  in  7Z  can  be  used. 
The  number  of  existentials  generated  by  the  intruder  is  not  bounded,  and  tests  for 
disequality  (as  well  as  equality)  are  allowed.  This  corresponds  to  the  top  box  in  the 
first  column  of  Table  9. 

To  prove  this  upper  bound  we  present  a  decision  procedure  that  takes  as  a  witness 
a  polynomial  length  input  representing  an  instantiation  of  a  sequence  of  protocol 
rules  used  in  an  attack  run.  The  decision  procedure  verifies  that  the  intruder  is 
capable  of  generating  the  messages  necessary  to  make  the  run  valid,  and  it  verifies 
whether  the  run  is  actually  an  attack.  We  show  that  this  decision  procedure  has  a 
polynomial  running  time. 

First  we  show  that  a  candidate  attack  run  is  of  polynomial  length.  Let  R  be  the 
maximum  number  of  steps  in  the  longest  role  of  the  protocol.  Since  we  limit  the 
number  of  roles  to  r,  any  attack  contains  at  most  rR  protocol  steps. 

Next  we  observe  that,  although  we  have  allowed  the  intruder  to  use  an  unlimited 
number  of  nonces  in  his  attack,  in  fact  the  limited  size  of  the  protocol  run  means 
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that  the  number  of  nonces  that  are  relevant  to  the  attack  has  a  polynomial  bound. 
Since  each  message  is  limited  to  size  k,  there  can  be  at  most  k  different  values  in  a 
given  message,  or  a  maximum  of  kR  values  for  a  given  role.  Thus,  the  maximum 
number  of  distinct  values  that  can  occur  in  an  attack  run  of  at  most  r  roles  is 
bounded  by  B  =  krR.  For  a  given  attack,  only  at  most  B  of  the  nonce’s  generated 
by  the  intruder  can  actually  appear  in  the  protocol  steps  of  the  run. 

Note  that  the  number  of  nonces  generated  by  the  protocol  is  also  bounded  by 
the  value  B.  As  in  Section  5.4.2,  the  protocol  nonces  can  be  generated  during  the 
role  generation  phase,  instead  of  during  protocol  execution.  So  in  our  analysis  we 
can  treat  protocol  nonces  as  constants  that  are  included  in  the  initial  role  state,  and 
only  consider  the  intruder-generated  nonces  during  the  protocol  execution. 

Using  these  observations,  we  can  construct  an  a  candidate  attack  run  of  polyno¬ 
mial  length,  as  follows: 

1.  Guess  a  set  of  at  most  B  =  krR  nonces  to  be  used  by  the  intruder  in  the 
attack.  These  are  included  in  the  intruder’s  initial  knowledge,  i.e.  for  each 
nonce  n*  G  {no,  ...,»b},  M(n,;)  G  X. 

2.  Guess  a  selection  of  up  to  r  roles  to  be  generated  from  the  role  generation 
theory  7 Z.  Note  that  these  role  state  facts  include  as  arguments  any  nonces 
that  would  be  generated  by  the  protocol  roles.  Call  this  set  of  initial  role 
states 

3.  Guess  a  candidate  sequence  of  up  to  rR  protocol  steps  for  the  attack.  The 
rules  used  in  the  candidate  attack  are  fully  instantiated,  using  the  values  from 
X,  /,.,  and  the  B  intruder  nonces  in  a  specific  way.  Let  N  <  rR  be  the  number 
of  steps  in  the  candidate  sequence,  and  label  each  step  in  the  sequence  st ,  for 
1  <  i  <  N. 

Since  the  protocol  is  in  restricted  form,  and  no  nonces  are  generated  by  the 
protocol  roles  during  the  run,  we  know  from  Definition  5.1  that  all  roles  must  be  of 
the  form: 

Ai(...),NRj(...)  ->Ak(...),Ns/(...) 

where  A]  and  A|<  are  role  states,  and  Nrj  and  are  network  predicates.  Note  that 
if  the  Nrj  is  missing  from  the  rule  (as  in  the  case  of  the  first  step  of  an  initiator 
role),  then  a  null  message  body  can  be  used,  so  all  rules  can  be  assumed  to  be  of 
this  form. 

To  verify  that  the  candidate  attack  sequence  is  a  valid  protocol  run,  we  must 
confirm  that  each  step  in  the  sequence  is  feasible.  This  involves  verifying  that  the 
protocol  role  sequence  is  valid  (i.e.  each  role  state  is  used  only  once,  and  they  are  not 
used  in  a  step  until  after  they  have  been  generated),  and  that  the  network  messages 
needed  at  each  step  can  be  generated  by  the  intruder  from  knowledge  it  has  available 
at  that  step. 

Let  Sj  be  the  current  multiset  of  facts  after  step  s* .  Let  So  =  I  +  Ir,  where  I  is 
the  multiset  of  facts  from  the  initialization  theory  X  and  Ir  is  the  set  of  initial  role 
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states  generated  in  step  2  above.  For  each  rule 


Si  :  Aj (m),  NR(n)  ->■  Aj (m'),  Ns(n') 


we  do  the  following: 

1.  Check  that  A \(m)  E  .S’,-  ]  .  If  not,  REJECT. 

2.  Decompose  the  target  message  NR(n)  into  its  smallest  components  by  applying 
rules  from  the  intruder  composition  theory  in  reverse,  until  a  set  of  persistent 
M  predicates  remain.  Call  this  set  of  M  terms  M . 

3.  Check  that  M  C  A,_i.  If  not,  REJECT. 

4.  Fully  decompose  the  term  Ns(n')  by  applying  all  rules  from  the  intruder  de¬ 
composition  theory  until  only  the  persistent  M  predicates  remain.  Call  this 
set  of  M  terms  M' . 

5.  Update  5,;  :=  (5j_i  l±J  {Aj(m')}  l±)  M')  —  (Ai(m)} 

6.  Increment  i. 

After  repeating  the  above  procedure  for  all  N  rules,  check  if  M(5)  E  S/y .  If  yes, 
then  ACCEPT  (meaning  this  is  an  attack  sequence),  if  not  REJECT. 

Since  the  size  of  each  message  is  bounded  by  k,  the  above  steps  can  each  be 
done  in  polynomial  time.  We  do  not  go  into  detail  here  about  steps  2  and  4, 
but  for  our  standard  two-phase  intruder  theory,  we  know  from  Lemma  3.13  that 
they  can  be  accomplished  in  polynomial  time.  Similar  algorithms  are  presented  in 
[CJM98,  RT01]. 

Since  the  number  of  protocol  steps  is  polynomial,  that  means  a  candidate  attack 
sequence  can  be  verified  in  polynomial  time.  Therefore,  Sroieb'is  in  np. 

This  upper  bound  also  holds  for  the  case  where  the  intruder  is  not  allowed 
to  generate  nonces,  or  when  disequality  tests  are  not  allowed,  since  the  decision 
procedure  still  works  on  these  more  restricted  protocols.  So  the  upper  bound  holds 
for  all  cases  in  the  first  column  of  Table  9.  □ 

5.4.4  Bounded  roles  and  fixed  k  is  in  p  (Thm  4) 

We  view  the  result  for  bounded  roles  and  fixed  k  ( <Sn0nceb,size=k )  t°  be  of  limited 
interest,  but  include  it  here  for  completeness. 

Proposition  5.4.  Snonceb^ize=kwithout  disequality  tests  is  in  p. 

Proof.  The  proof  technique  is  the  same  as  for  Proposition  5.2.  We  have  a  fixed 
alphabet  S  of  size  0((|T|  +  \M.\  +  n)),  and  there  is  a  |E|fc  bound  on  the  number 
of  distinct  ground  facts  that  may  appear  in  any  possible  protocol  execution.  If  k  is 
a  constant,  then  the  number  of  ground  terms  is  polynomial  instead  of  exponential, 
and  the  running  time  of  the  algorithm  is  polynomial. 

This  case  is  analogous  to  the  data  complexity  of  Datalog  in  [DEGV97]. 
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Note  that  the  same  restriction  on  disequality  tests  applies  as  in  Proposition  5.2. 
If  the  intruder  is  allowed  to  generate  nonces  and  the  protocol  roles  are  able  to  test 
for  disequality,  then  the  alphabet  is  no  longer  fixed,  and  this  proof  fails.  □ 

5.5  Lower  Bounds 

In  this  section  we  examine  the  lower  bounds.  The  proofs  make  use  of  a  reduction 
from  Horn  clauses  to  protocols  in  restricted  form,  so  we  examine  that  reduction  first. 
Horn  clauses  without  function  symbols  are  undecidable  if  existentials  are  allowed, 
and  DEXP-hard  without  existentials.  Our  proof  is  by  reduction  from  existential 
Horn  clauses  without  function  symbols  to  protocol  theories  in  restricted  form.  The 
reduction  introduces  function  symbols  in  the  protocol  theory,  but  not  in  the  Horn 
clauses. 

Also  note  that  these  proofs  do  not  rely  on  any  use  of  existentials  from  the 
intruder,  nor  on  the  use  of  disequality  tests.  This  means  that  one  lower  bound 
suffices  for  each  column  in  the  complexity  matrix. 

5.5.1  Representing  Horn  Clauses  as  Protocol  Theories 

An  existential  Horn  clause  is  a  closed  first-order  formula  of  the  form 

Vaq  .  A  ...  A  ak) 

=>  .  ..By-jiP  i  A  ...  A  j3i)\ 

where  n  i . . . .  .-o/;..  o’i , . it  are  first-order  atomic  formulas. 

We  will  show  that,  given  a  Horn  theory  that  consists  of  a  set  of  existential  Horn 
formulas,  we  can  construct  a  protocol  so  that,  when  combined  with  the  standard 
intruder  theory,  the  intruder  may  learn  a  representation  of  a  formula  iff  it  is  a 
consequence  of  the  Horn  theory. 

Our  encoding  uses  the  intruder  to  replicate  formulas.  Since  each  protocol  role 
can  only  execute  a  finite  sequence  of  steps,  we  use  a  separate  role  for  each  Horn 
clause.  The  function  of  the  intruder  is  to  convert  the  final  message  sent  by  one  role 
to  an  initial  message  received  by  another  role.  As  a  result  of  intruder  actions,  a 
datum  may  pass  through  an  unbounded  number  of  protocol  steps. 

In  order  to  represent  the  Horn  theory  faithfully,  we  cannot  give  the  intruder 
complete  access  to  the  atomic  formulas  used  in  a  Horn  clause.  In  particular,  we 
cannot  let  the  intruder  combine  data  from  different  messages.  For  example,  if  one 
role  sends  a  message  representing  P(a,  6),  we  cannot  allow  the  intruder  to  intercept 
this  message  and  replace  it  with  P(b,a).  We  prevent  this  form  of  interference  by 
encrypting  atomic  formulas  with  a  shared  private  key. 

We  define  an  encoding  of  a  conjunction  of  atomic  formulas  into  a  single  term  of 
type  msg.  We  use  the  notation  \cj>\  to  indicate  the  encoding  of  formula  (jj.  where 
4>  =  Pi(h,i,  ■  ■  ■  ti,ii )  A  ...  A  Pk (4,i,  •  •  •  ,tk,ik) 

For  this  encoding  we  use  a  secret  key  K  which  is  not  known  by  the  intruder, 
and  we  assume  that  for  each  sequence  of  predicates  P\ ■  ■  -  Pk  that  occurs  together 
in  the  left  or  right  hand  side  of  a  given  Horn  clause,  we  have  a  constant  symbol 
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P\.P2 . Pk-  (Although  we  have  used  a  sequence  of  letters,  numbers  and  subscripts 

to  write  out  our  name  for  this  constant  symbol,  we  assume  it  is  an  atomic  constant 
symbol  of  the  language.) 

Given  these  assumptions,  we  let 

\Pi  (^1,1)  •  •  •  tl,ii )  A  ...  A  Pk  {tk,  1,  •  •  ■  ,  tk,ik  )1 
enc( K,  {Pi  -P‘2 . Pk 3^1,1,  •  •  •  i  ^l,ii  3  ^k,l ,  •  •  •  i  ) ) 

Note  that  the  size  of  the  terms  arising  from  \<f>\  is  linear  in  the  size  of  the  terms 
in  ([>.  Specifically,  calculating  term  size  as  we  defined  in  Definition  3.7,  if  4>  contains 
p  predicates  each  of  maximum  size  s,  then  |  \cj>  \  |  <3  +  p  *  s  . 

We  encode  a  given  existential  Horn  clause  C  into  a  set  of  protocol  roles.  One  role 
represents  the  clause  itself,  and  in  addition,  for  each  conjunction  of  atomic  formulas 
that  appears  in  the  Horn  theory,  we  need  a  way  to  create  that  conjunction  from 
atomic  formulas,  and  to  decompose  it  into  atomic  formulas.  We  define  Role(C)  = 
7 Z(C)  +  C(C)  +  P(C),  where  7 Z(C)  is  the  role  corresponding  to  clause  C,  and  C(C) 
and  V(C)  are  the  composition  and  decomposition  roles  for  clause  C . 

The  role  P(C)  for  a  clause  C 

Vaq  . . .  Vxi  [(aq  A  ...  A  ) 

=>  .  ..3yj{Pi  A  . .  •  A  fii)] 

is 

Ao,  NRa0(|"«i  A  ...  A  ak]) 

— »  . . .  gy,.Ai,NSai(r/3i  A  ...  A  (3[\ ) 

where  % .  j  >  0,  k,l  >  1.  For  example,  the  role  for  a  pure  Horn  clause 

Mxi . . .  Vrri[(Q!i  A  . . .  A  ak)  =>  /3] 

is 

Ao- NRao(  [rvi  A  ...  A«jtl)  — >  Ai,  NSai  ( \fi\ ). 

The  representation  of  the  composition  role  C(C)  and  the  decomposition  role 
T>(C)  is  perhaps  best  illustrated  by  an  example.  Suppose  that  the  existential  Horn 
clause 

VxVy[(P(x)  AQ(x,y)  A  R{y))  = =>  3z(P{z)  A  Q(y,  z)] 

is  part  of  the  Horn  theory  we  wish  to  represent  by  a  protocol.  In  order  to  use  this 
implication,  the  protocol  must  produce  a  message  containing  \(P(a)  AQ(a,  b)AR(b)] 
for  some  a  and  b.  However,  the  protocol  roles  that  represent  Horn  clauses  produce 
encodings  of  conjunctions  of  atomic  formulas,  and  the  atomic  formulas  here  may 
come  from  different  rules.  Therefore,  we  need  additional  protocol  roles  that  select 
atomic  formulas  out  of  conjunctions  and  combine  them. 

The  process  is  very  similar  to  the  encoding  of  the  two-phase  intruder  in  Sec¬ 
tion  3.5,  except  that  protocol  roles  can  manipulate  encrypted  values.  For  each  con¬ 
junction  form  (including  variables)  that  appears  on  the  right-hand  side  of  a  Horn 
clause,  such  as  P{z)  A  Q(y ,  z),  we  include  decomposition  roles  of  the  form 

A0,  NR0(\P(z)  A  Q(y,z) j)  — ►  Ax,  NSi(m*)l) 
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and 


B0MR0(\P(z)  AQ(y,z)])  — ►  Bx,  NSi(rQ(y,^)l) 

for  predicates  Ao,  Ai,  Bo,  Bi  not  used  for  other  roles.  We  also  need  a  composition 
role  for  the  left-hand-side  of  each  original  Horn  clause.  For  the  clause  above,  the 
role  will  have  states  Ao,  Ai,  A2  and  A3.  At  each  step,  the  role  reads  one  of  the  atomic 
formulas  in  its  target  conjunction,  sending  out  either  a  dummy  message  or,  at  the 
last  step,  a  message  containing  the  conjunction  of  atomic  formulas  needed.  In  order 
to  assemble  \(P(x)  A  Q(x,y)  A  R{y)  |,  we  can  use  a  role  A  with  the  following  steps: 

A0,NR0(rP(^)l)  ^AitrP^NsiO 
Ai  ( r-P(*)l ),  NR2 ( rO(a:,  y)l )  — > 

A2(  I ~P{x)  A  Q{x,y)  1),  NS3() 

A2(F P{x)  a  Q(x,y)  1),  NR4( |"i?(y)] )  — > 

A3O,  NS5([ P{x)  A  Q{x,y)  A  R{y) ]) 

After  this  role  sends  message  Nss,  the  intruder  can  read  the  data  [ P{x)  A  Q(x,y)  A 
R(y)~\  contained  in  this  message  and  forward  it  to  the  role  representing  the  Horn 
clause  with  hypothesis  P{x)  A  Q(x,y)  A  R{y). 

Given  a  set  of  existential  Horn  clauses  H ,  we  define  an  encoding  into  a  protocol 
theory  in  restricted  form,  V{H)  =  Role{cj>). 

Lemma  5.3.  The  construction  ofV(H)  from,  H  is  computable  in  polynomial  time. 
Furthermore,  if  H  has  no  existential  quantifiers,  thenV{H)  has  no  existential  quan¬ 
tifiers. 

Proof.  If  a  Horn  clause  theory  H  consists  of  n  clauses  (Cj  ,62,. ...  Cn\  with  a  max¬ 
imum  conjunction  size  of  m  atomic  formulas  in  any  clause,  then  the  corresponding 
protocol  theory  V{H)  contains  n  1-step  roles  7 Z(Ci),  one  corresponding  to  each 
clause,  plus  a  set  of  up  to  m  1-step  decomposition  roles  in  each  V(Ci )  for  each 
conjunction  on  the  right  hand  side  of  a  clause  G,;,  and  a  (at  most)  m-step  composi¬ 
tion  role  C(Cj)  for  the  conjunction  on  the  left  hand  side  of  the  clause  G,;.  Thus  the 
encoding  is  polynomial  and  O(mn).  □ 

Lemma  5.4.  Let  V(H)  be  the  encoding  of  a  set  of  existential  Horn  clauses  H  into 
a  restricted  protocol  theory,  let  f>  be  a  formula,  and  let  M  be  a  standard  two-phase 
intruder  theory.  A  run  of  V{H)  +  M.  can  lead  to  a  state  containing  M  ( [7/)] )  iff  f 
is  derivable  from  H.  Furthermore,  if  the  formulas  in  H  have  maximum,  term  size 
bounded  by  s,  then  the  run  V{H)  +  M.  — >  M(  \cj>\)  has  a  maximum  term  size  f{s), 
where  f  is  a  linear  function  of  s. 

Proof.  The  proof  is  by  induction  on  the  length  of  derivations.  We  need  to  prove 
both  directions.  First  we  show  that  if  H  h  f  then  a  run  of  V(H)  +  M.  can  lead  to 
a  state  containing  M  ( |" </>]). 

First  consider  the  base  case.  If  f  is  initially  true,  that  is  equivalent  to  a  Horn 
clause  of  form 

true  cj) 
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From  our  construction,  the  clause  Cj  yields  a  protocol  role 


Ao,  Nrao(  )  — >  Ai,  NSA1  ( [0] ) 

The  intruder  can  obtain  M  ([0])  by  the  following  sequence: 


C(  ) 
Ao,  Nrao(  ) 
Nsai(|>1  ) 
D([0l) 


C(  ) 

NrAo(  ) 

A1,  NSA1  ([0]) 

D([0l) 

M([0l) 


This  proves  the  base  case. 

For  the  induction  step,  assume  from  our  derivation  up  to  clause  (7,;_i,  we  have 
proven  H  b  {</>o,  0i, . ■ .  0n_i}  and  the  intruder  knows  {M(  [0o] ),  M(  [0i] ), . . . ,  M(  [0n_i] )}. 
Suppose  the  next  clause  Cj  in  the  derivation  is 

00  A  01  A  ...  A  (f>n-l  =>  3 z.<t>n{z ) 

From  our  construction  this  yields  a  protocol  theory  Role(Cj)  =  7Z(Cj )  +  C(Cj)  + 

V(Cj),  where  7Z(0,)  is 

Aq,  Nr3o(  [00  A  0i  A  ...  A  0n— l] )  >  3£Aj,  Ngal  ([0nO?)D 

And  0(0/)  is 

Q- Nrc0  ([0O1)  — ►  C'([0ol),  NScl() 

C'i ( [ 0ol ) j  Nrci ( [0il )  — ►  C|2([0oA011),N^c2() 

C[|_2(  [0o  A  01  A  ...  A  0ra_2]  ),  NRcn_2(  [ 0n— l]  )  >  Cn_!  (),  (  [00  A  01  A  .  .  .  A  0n— l]  ) 

Using  these  roles  (V(Cj)  is  not  needed  here),  it  is  possible  for  the  intruder  to  con¬ 
struct  a  run  that  results  in  M([0n])  in  a  manner  similar  to  the  base  case  above. 

Next  we  show  that  if  a  run  of  V(H)  +  M.  can  lead  to  a  state  containing  M(  [0] ) 
then  H  b  0. 

Initially  if  the  intruder  memory  contains  M([0]),  this  corresponds  to  a  clause 
true  =>  0. 

Suppose  the  intruder  knows  {  M  ( [0o] ) ,  M  ( [0i] ),...,  M  ( [ 0n_i] ) }  after  executing- 
role  Rj- 1,  and  our  corresponding  derivation  has  proven  H  b  {0o,  0i, . . .  0ra-i}-  From 
our  construction,  the  next  role  Rj  used  by  the  intruder  to  obtain  M(  [0„] )  will  be  of 
type  7Z(C),  V(C),  or  C(C),  for  some  clause  C  G  H. 

If  the  role  is  of  form  71(C) ,  then  this  corresponds  directly  to  the  clause  C. 

If  the  role  is  of  form  V(C ),  then  this  corresponds  to  a  use  of  the  logical  axiom 

AABAC  => A 
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If  the  role  is  of  form  C(C ),  then  this  corresponds  to  a  use  of  the  logical  axiom 

A  A  B  =>  (A  A  B) 

So  in  each  case,  after  the  intruder  executes  the  next  role  in  the  derivation  to  obtain 
M  ( T (j>n  \  )i  it  is  possible  to  prove  <f>n  from  H  by  using  a  clause  C  corresponding  to 
that  role. 

Consider  the  maximum  term  size  appearing  in  the  run  of  V{H)  +  Ad.  Each  term 
occuring  in  the  run  (in  both  the  intruder  steps  and  the  protocol  steps)  is  just  a 
predicate  applied  to  [</>],  for  some  4>  in  H.  So  if  n  =  |  \ff\  |  for  the  largest  term  <j b  in 
if,  then  the  maximum  term  size  occuring  in  the  derivation  from  a  protocol  step  is 
just  k  =  n  + 1.  Since  |  \cj>  \  |  is  linear  in  the  size  of  f>.  the  maximum  term  size  occuring 
in  the  run  is  linear  in  the  size  of  the  formulas  in  if. 

□ 


5.5.2  Ssize=k is  Undecidable  (Thm  1) 

For  the  case  of  unbounded  roles  and  unbounded  existentials  (the  rightmost  column 
in  Table  9),  we  turn  to  results  from  Database  theory,  where  the  complexity  results 
for  Embedded  Implicational  Dependencies  (EIDs)  [CLM81]  and  Datalog  [DEGV97] 
can  be  applied.  Embedded  Implication  Dependencies  are  exactly  Horn  clauses  with 
existentials  and  equality,  as  defined  in  Section  5.5.1.  In  [CLM81],  this  problem 
is  proved  to  be  undecidable,  by  a  reduction  from  the  halting  problem  for  a  two- 
counter  machine.  A  reduction  can  also  be  made  from  the  halting  problem  for  a 
Turing  machine,  as  we  show  in  Appendix  A.l. 

Without  restriction  on  the  form  of  the  atomic  formulas,  undecidability  of  the 
implication  problem  for  existential  Horn  clauses  follows  immediately  from  the  unde¬ 
cidability  of  Horn  clauses  without  existential  quantifiers.  The  problem  of  interest  to 
us,  however,  is  implication  when  the  atomic  formulas  contain  no  function  symbols. 

Proposition  5.5.  undecidable  for  every  derivation  term  size  greater  than 

some  small  value  k. 

Proof.  Our  proof  is  a  reduction  from  existential  Horn  clauses.  Given  any  set  if  of 
existential  Horn  clauses  with  no  function  symbols,  and  a  formula  cj>.  we  can  construct 
a  protocol  theory  in  restricted  form  V{H).  We  know  from  Lemma  5.3  that  the 
number  of  roles  in  V  ( it )  is  polynomial  in  the  number  and  size  of  the  Horn  clauses 
in  H.  If  Ad  is  a  standard  two-phase  intruder,  then  a  run  of  'P(B)  +  Ad  can  lead 
to  a  state  containing  M(</>)  iff  f>  is  derivable  from  H.  This  follows  from  Lemma  5.4. 
Since  the  derivability  problem  is  undecidable  by  reduction  from  the  halting  problem 
(Lemma  A. 2),  the  secrecy  problem  for  protocol  theories  is  also  undecidable. 

Furthermore  as  shown  in  Appendix  A.l,  it  is  possible  to  encode  a  Turing  machine 
using  Horn  clauses  of  relatively  small  size  (we  use  conjunctions  of  up  to  8  predicates, 
each  with  up  to  3  arguments).  Since  we  know  from  Lemma  5.4  that  the  maximum 
term  size  occuring  in  the  run  of  V{H)  +  Ad  is  linear  in  the  size  of  the  predicates  in 
H,  then  the  minimum  term  size  k  required  for  undecidability  is  also  small.  □ 
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Note  that  the  Turing  machine  encoding  in  Appendix  A.l  leads  to  the  result  that 
<Ssjze=/ds  Undecidable  for  all  values  of  k  >&  30,  but  we  haven’t  made  any  special 
effort  in  our  construction  to  achieve  a  small  term  size. 

5.5.3  Snonceb is  DEXP-hard  (Thm  2) 

In  the  case  of  no  existentials,  both  [DEGV97]  and  [CLM81]  show  a  dexp  lower 
bound.  In  fact,  DEXP-hardness  follows  by  the  same  encoding  of  Horn  formulas  (Data- 
log  programs,  or  full  embedded  implicational  dependencies)  as  in  our  undecidability 
proof,  applied  to  Horn  clauses  without  function  symbols  and  without  existential 
quantification.  For  these  Horn  theories,  DEXP-hardness  of  the  implication  problem 
(measured  as  a  function  of  the  size  of  the  theory)  is  implicit  in  [Imm86,  Var82],  as 
explained  in  [DEGV97].  The  lower  bound  for  Horn  theories  is  similar  to  the  Turing 
machine  representation  for  the  unbounded  case,  using  a  form  of  “symbolic  counter” 
instead  of  Skolem  symbols  to  name  the  cells  in  a  bounded  section  of  the  Turing 
machine  tape. 

Proposition  5.6.  Snonceb'is  dexp -hard. 

Proof.  Our  proof  is  a  reduction  from  Horn  clauses  without  existential  quantifiers. 
Given  any  set  H  of  Horn  clauses  without  existential  quantifiers  or  function  symbols, 
and  a  formula  <j),  we  can  construct  a  protocol  theory  in  restricted  form  V{H).  We 
know  from  Lemma  5.3  that  this  construction  is  polynomial  in  size.  If  Ad  is  a  standard 
two-phase  intruder,  then  a  run  of  V{H)  +  Ad  can  lead  to  a  state  containing  M(^) 
iff  (j)  is  derivable  from  H.  This  follows  from  Lemma  5.4,  since  the  construction 
doesn’t  require  any  protocol  or  intruder  nonces.  Since  the  derivability  problem  for 
non-existential  Horn  clauses  is  DEXP-hard  by  reduction  from  Deterministic  Turing 
Machines  (DTM)  with  exponential  running  time  (Lemma  A. 4),  the  secrecy  problem 
for  protocol  theories  with  bounded  nonces  is  also  DEXP-hard.  □ 

Note  that  the  Horn  clauses  constructed  in  Appendix  A. 2  use  a  term  size  that 
is  proportional  to  the  running  time  of  the  problem  (exponential  in  the  input  to  the 
problem,  w).  This  means  that  runs  of  a  protocol  V{H)  implementing  those  Horn 
clauses  have  a  maximum  term  size,  kw.  which  is  exponential  in  the  size  of  the  input 
to  the  problem.  DEXP-hardness  only  applies  to  the  secrecy  problem  Snonceb for  values 
of  k  >  kw. 

5.5.4  Swills  NP-hard  (Thm  3) 

Here  we  consider  a  restricted  protocol  theory  where  the  Intruder  can  not  generate 
existentials,  the  number  of  instances  of  each  role  is  bounded,  and  no  disequality  tests 
are  allowed.  This  corresponds  to  the  bottom  box  in  the  first  column  of  Table  9. 

We  can  prove  this  problem  is  NP-hard  by  reducing  Turing  machines  to  Horn 
clauses,  similar  to  the  proofs  for  Theorems  1  and  2.  A  direct  Turing  machine  proof 
was  used  for  the  result  reported  in  the  workshop  for  [DLMS99].  Subsequently, 
several  authors  have  shown  a  direct  reduction  from  3-SAT  to  prove  NP-hardness 
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A: 

B: 

B’: 

C: 

C’: 

D: 

D’: 

E: 


^0)  Nro(wi,  U2,  ■  ■  ■ ,  vn)  — > 

Ai,  NS1  (enc(P,  <fi  (V),  f2(V), . . . ,  fm(F),  end))) 

B0,  Nri (enc(P,  (T ,x,y,z}))  — > 

Bi,  NS2(en c{P,  z)) 

B'o,  Nri  (enc(P,  (er\c(K,  ->T),x,y,  z)))  — > 

Bi,NS2(en  c(P,z)) 

C0,  NR1(enc(P,  (x,T,y,z)))  — > 

Ci,  NS2(en c(P,z)) 

Co,  NR1(enc(P,  {x,  enc(iC,  ->T),  y,  z)))  — > 

Ci,  NS2(enc(P,  z)) 

30,  NR1(enc(P,  (x,y,T,z)))  — > 

□i,  NS2(enc(P,  z)) 

y0,  NRi  (enc(P,  {x,  y,  en c{K,  — >T),  z)))  — > 

n'  m  c  o  ( er\r(  P 


NR2(enc(P,  end)) 


Di,NS2(enc(P,^)) 
'  E1,NS3(5) 


Table  10:  3-SAT  Theory  V?,s  \t 


for  this  case  [AL00,  RT01].  Since  that  proof  is  straightforward,  we  reproduce  here 
the  reduction  from  [RT01],  modified  to  use  our  restricted  protocol  theory  notation. 

3-SAT  is  a  version  of  the  satisfiability  problem  in  conjunctive  normal  form,  with 
exactly  three  literals  per  clause  [Sip97,  page  249],  We  define  an  instance  of  3-SAT, 
and  some  notation  as  follows 

•  Propositional  Variables  V  =  {up  u2,  •  •  • ,  vn}. 

•  Literals  L  =  v  or  L  =  —>v. 

•  Clause  C  =  P  V  L'  V  L" . 

•  Formula  F  =  C  A  C'  A  . . .  A  C". 

Given  a  formula  P,  let  Ct  be  the  i-th  conjunct,  L,j  be  the  j-th  literal  of  the  i-tli 
conjunct,  and  xr,j  G  V  be  the  variable  appearing  in  the  literal  We  can  write 
Li,j  =  'j  ,  where  G  {0, 1}  and  x°  =  x  and  x1  =  ~<x. 

Thus  an  instance  of  3-SAT  with  variables  V  and  clauses  I  can  be  written  F(V)  = 
Kei  Ci 

Table  10  shows  a  protocol  theory  Vzsat  in  restricted  form  that  corresponds  to 
an  encoding  of  an  instance  of  the  3-SAT  problem.  This  is  a  general  construction  for 
all  instances  of  3-SAT,  with  the  particular  instance  encoded  into  the  details  of  the 
initialization  role,  role  A.  The  input  to  role  A  is  a  message  containing  assignments  of 
T  and  -iT  to  each  of  the  propositional  variables  up  It  outputs  a  message  representing 
the  3-SAT  encoding  with  a  special  token  end  appended,  encrypted  with  the  secret  key 
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P  which  is  shared  by  the  protocol  roles,  but  not  known  to  the  intruder.  This  message 
encodes  each  clause  Cj  using  the  function  /,;(  )  which  is  described  below.  The  roles 
B ,  IV .  C.  C .  D.  D'  each  examine  a  3-tuple  of  the  encoding,  and  if  it  contains  either  T 
or  { — i T } (both  representations  of  “true”),  then  it  strips  off  the  3-tuple  and  returns 
the  rest  of  the  encoding.  The  final  role  E  broadcasts  the  secret  S  if  it  receives  a 
message  containing  the  special  token  {end}p. 

In  the  encoding  of  the  literals,  we  use  encryption  to  represent  logical  negation, 
i.e.  ~>x  {x}k,  and  x  =  {-*x}k-  We  introduce  the  function  g(  )  to  formalize  this: 

•  5(0,  x)  =  x 

•  g{  M)  =  M  k 

Finally,  each  clause  Cj  is  encoded  by  a  function  /,  (  )  in  role  A  as  follows: 

•  Vi  G  I,  ft(V)  =  (g{ei,%iXiM,  g(eiy2,xiy2),  s(ej, 3,^,3)) 

For  example,  if  the  clause  C\  =  v\Vv\  VU4,  then  fi{V)  =  (<?(0,  v\),  g(l,  V2),  g(0,  V4))  = 
{vi,  {  Mk,  r.\). 

Proposition  5.7.  Sroiebis  np -hard. 

Proof.  By  reduction  from  3-SAT.  By  construction,  the  intruder  can  guess  a  solution 
and  run  to  completion  iff  the  instance  of  3-SAT  is  satisfiable. 

Given  a  set  of  propositional  variables  V,  a  set  of  clauses  I,  and  an  instance  of 
3-SAT  F(V)  =  f\ieI  C{ ,  construct  {T,ftA,S,n,r,k),  where  S  is  a  secret,  Ad  is  a 
standard  intruder,  and  T  is  the  3-SAT  theory  constructed  as  described  above  and 
shown  in  Table  10.  Specifically,  r  =  |/|,  i.e.  the  number  of  role  instances  needed  is 
equal  to  the  number  of  3-SAT  conjuncts.  The  number  of  nonces  needed  is  zero.  The 
maximum  term  size  that  appears  in  the  attack,  ki,  is  proportional  to  the  number  of 
clauses  in  /,  since  the  largest  term  appearing  in  the  attack  will  be  the  output  term 
Nsi(  )  from  role  A  of  the  protocol.  So,  the  attack  is  possible  for  any  k  >  kj. 

Given  (T,  Ad,  S,n,r,k),  with  n  >  0,  r  >  |/|  and  k  >  ki ,  and  an  initial  intruder 
knowledge  of  |M(T),  M(-iT)},  the  intruder  can  learn  the  secret  M(S)  by  broadcast¬ 
ing  a  message  containing  a  solution  of  the  3-SAT  problem  on  the  network  and  then 
transforming  the  various  network  message  formats  to  make  the  protocol  run.  Thus, 
there  is  an  attack  on  the  protocol  Vzsat  iff  the  corresponding  3-SAT  problem  has 
a  solution.  □ 


6  Examples:  Lower  Bounds  as  Protocols 

The  previous  section  examined  the  complexity  of  security  protocols  in  terms  of  the 
Multiset  rewriting  formalism,  but  it  may  be  useful  to  examine  the  phenomena  that 
cause  protocols  to  be  difficult  to  analyze,  using  a  more  common  and  less  formal 
notation. 
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Keys:  K  -  symmetric  key  shared  by  A,  Bi,  C 

Server/Client  Protocols: 


Audit  Protocols: 


A  — >  Bi  : 
Bx  — »  A  : 

A  — >  B2  : 
B2  — >  A  : 

A^B3: 
B3  — >  A  : 

A  — >  B4: 
B4  — »  A  : 

A  — »  C  : 
C  — »  A  : 

A  — »  C  : 
C  — >  A  : 


{x1,x2,x3,0  }K 
{xi,x2lx3,  1}a' 

{xi,x2,  0, 1  }k 
{ x  i  •,  x2 . 1,0}*- 

{xii  0, 1, 1}*- 
{a:i,  1,  0,  0}  k 

{0, 1, 1, 1}*' 

{1, 0,0,0}*- 

{0,0,0,0}A- 

OK 

SECRET 


Table  11:  Rules  for  Exponential  Protocol,  s  =  4 


6.1  An  Exponential  Attack  Without  Nonces 

Here  we  present  a  simple  protocol  construction  that  gives  some  intuition  for  the 
exponential  lower  bound.  This  is  not  a  formal  proof,  but  it  shows  an  example  where 
even  without  generating  new  data,  determining  a  security  property  may  require 
exponentially-many  runs  of  a  protocol.  This  particular  example  was  helpful  to  the 
authors  in  gaining  the  intuition  that  inspired  the  undecidability  and  exponential 
lower  bound  results. 

Consider  the  following  example,  which  shows  a  fragment  of  an  audited  key  distri¬ 
bution  protocol,  for  one  key  server  and  s  clients.  The  protocol  for  integer  s  assumes 
that  a  private  symmetric  key  K  is  shared  between  the  principals  A,  B\, . . . ,  Bs  and 
C.  (The  same  effect  can  be  achieved  in  a  public  key  protocol,  by  first  running  secure 
key  exchange  steps.)  Here  A  is  a  key  server,  B 1 1,.  are  clients,  and  C  is  an  audit 

process. 

In  Table  11  we  show  the  protocol  for  s  =  4.  There  are  s  Server/Client  sub¬ 
protocols,  one  for  each  client.  In  these  sub-protocols  A  sends  a  value  which  corre¬ 
sponds  to  a  certain  binary  pattern,  and  B,t  responds  by  incrementing  the  pattern  by 
one.  We  use  the  notation  x-i  to  indicate  the  “don’t  care”  values  in  the  messages  in 
the  Server/Client  sub-protocols.  For  example,  in  the  protocol  between  A  and  B\, 
in  the  first  step  A  sends  a  message  {x4,  x2l  x3,0 }*-,  which  consists  of  four  digits, 
ending  in  a  zero,  encrypted  by  the  key  K.  The  first  three  digits  in  this  messages  are 
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represented  by  x\,  X2 ,  and  X3,  indicating  that  B\  doesn’t  check  those  values  (they 
can  be  either  0  or  1).  In  the  second  step,  if  B\  sees  the  0  he  is  expecting  in  the  last 
digit  if  the  message  he  received,  he  responds  by  incrementing  the  value  he  received 
from  A.  i.e.  B\  sends  {aq,  X2,  X3,  1}a',  where  the  x\,  X2-  and  x%  are  the  same  values 
received  in  Step  1. 

The  protocol  suite  also  includes  two  audit  sub-protocols.  In  the  first  protocol 
the  server  A  sends  a  message  of  all  zero’s  to  C  to  indicate  that  the  protocol  finished 
correctly.  In  the  second  protocol,  A  sends  a  message  of  all  one’s  to  indicate  that 
there  is  an  error.  The  second  audit  protocol  has  the  side-effect  of  broadcasting  the 
SECRET  if  C  receives  the  error  message. 

If  no  attacker  is  present,  a  run  of  this  protocol  would  consist  of  2s  +  1  messages, 
with  the  final  message  of  all  zero’s  sent  to  C,  which  responds  with  OK.  However, 
if  a  Dolev-Yao  intruder  is  present,  he  can  route  an  initial  message  of  all  0’s  from  A 
through  2s  —  1  B  principals  in  repeated  runs  of  the  protocol,  thus  building  a  message 
consisting  of  all  l’s,  which  he  can  send  to  C  to  cause  the  SECRET  to  be  broadcast. 
It  is  easy  to  see  that  unless  an  exponential  number  of  messages  are  sent,  as  long  as 
A  always  uses  0  for  all  the  x,  positions,  the  SECRET  remains  secret. 

An  interesting  aspect  of  the  protocol  above  is  that  it  shows  that  a  protocol  can 
be  secure  against  polynomial- time  attack,  but  considered  insecure  under  Dolev-Yao 
assumptions. 


6.2  A  Class  of  Undecidable  Protocols 

We  will  generalize  the  exponential  protocol  used  in  the  previous  example  by  adding 
nonces  to  construct  an  undecidable  protocol  that  uses  small  message  size.  First,  we 
briefly  review  the  Post  correspondence  problem. 

A  well  known  example  of  an  undecidable  problem  is  the  Post  correspondence 
problem,  (PCP)  [Pos46],  which  concerns  simple  manipulation  of  strings.  This  prob¬ 
lem  can  be  formally  described  (as  in  [Sip97,  page  184])  as  follows: 

An  instance  of  the  PCP  is  a  collection  P  of  tiles: 


P  = 


h 

h 


h_ 

b2 


and  a  match  is  a  sequence  z  1 , « 2 ,  ■  ■  ■  •  it,  where  . . .  ti(  =  . . .  bir  The  problem 

is  to  determine  whether  P  has  a  match.  Let 


PCP  =  {(P)\P  is  an  instance  of  the  Post  correspondence  problem  with  a  match}. 

We  define  the  width  of  a  PCP  instance  as  the  length  of  the  longest  string  in  a  tile. 
The  size  of  a  PCP  instance  is  the  number  of  tiles  in  P.  PCP  is  undecidable  for 
relatively  small  problem  sizes.  In  particular,  it  has  been  proven  to  be  undecidable 
for  size  7  [MS96]. 

We  propose  a  protocol  that  makes  it  possible  to  construct  a  “route”  consisting 
of  two  strings,  a  “path”  and  a  “return  path”,  by  selecting  from  a  set  of  sub-routes. 
Each  sub-route  (which  we  call  a  “tile”)  will  actually  contain  two  parts,  one  for  the 
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Keys:  K  -  ^4’s  session  key;  T,B  -  long  term  secrets  shared  by  C.T, 
Constants:  EOFy.  EOFB  -  indicate  the  end  of  a  path 

Ta«=  [£].[*].[&].••  • 

[tt]  A — >T1'-  {h,bi}K 

T\  — >  A  :  {m,  a,  n2}r,  {^2,  b,  u3}t,  { n 3,  c,  / 1  }•/>. 

{n4,  b,  n5}Bl  {n5,  c,  bi}B,  {«i,  n4}A' 

[^]  A — >T2:  (ti’bi}# 

T‘2  — >  A  :  {n\,  a,  W2}t,  {^2,  c.  t4}r,  {n3,  c,  n4 } « . 

{n4,  a,  ra5}B,  {n5.  b,  bi }Bl  {n4,  n3}A' 

[bEi]  ^ ^  T3  : 

T3  — >  ,4  :  {ni,b,ii}r,{ra2,b,ra3}B,{n3,c,ra4}B| 
{ni,a,b1}B,{n1,n2}K 

Checker: 

A  — >  C  :  {ti,b\}K,  {ti,  X,  t2}r,  {bi,X,  &2}_b 
C  — >  A  :  {^2,  b-> }  k 

A  — >  C  :  {ii,61}ic,{ii,X,EOFT}r,{61,X,EOFB}B 
C  — >  A  :  SECRET 

Table  12:  Rules  for  Path  Protocol 


forward  path  (which  we  call  the  “top”),  and  one  for  the  return  path  (which  we  call 
the  “bottom”).  The  intent  is  that  the  set  of  sub-route  pieces  (the  set  of  tiles)  has 
been  chosen  in  such  a  way  as  to  make  it  impossible,  or  at  least  extremely  difficult, 
to  construct  a  sequence  of  tiles  such  that  the  top  path  and  the  bottom  path  are  the 
same. 

Table  12  shows  an  example  of  part  of  our  proposed  path  protocol,  for  a  particular 
set  of  tiles.  The  “Tiles”  roles  are  used  to  build  a  complete  route  out  of  sub-route 
tiles.  Each  role  adds  its  tile,  one  node  at  a  time,  to  the  front  of  the  current  route, 
building  a  linked  list  for  the  top  and  bottom  paths.  For  an  arbitrary  set  of  tiles,  the 
number  of  “Tiles”  roles  is  equal  to  the  number  of  tiles  in  the  set,  and  the  number 
of  messages  broadcast  by  each  role  corresponds  to  the  width  of  the  tiles. 

The  “Checker”  roles  are  included  to  allow  a  designer  to  determine  if  they  have 
selected  a  “good”  set  of  sub-route  tiles,  by  testing  with  a  protocol  analyzer  to  see  if  it 
is  possible  to  create  a  route  with  identical  forward  and  backward  paths,  using  these 
tiles.  The  “Checker”  protocol  steps  through  the  links  in  a  route  and  broadcasts  the 
message  SECRET  if  all  nodes  on  the  top  and  bottom  paths  match  (indicating  an 
error). 
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In  this  example,  we  assume  the  values  n,  are  nonces,  that  K  is  a  session  key, 
and  T  and  B  are  long  term  secrets  used  to  encode  the  top  and  bottom  paths, 
respectively  (actually  we  use  two  different  keys  to  serve  the  additional  purpose  of 
providing  typing  information  to  distinguish  messages  that  are  part  of  the  top  path 
from  messages  that  are  part  of  the  bottom  path  -  this  typing  information  could  be 
part  of  the  message  payload,  and  then  one  key  would  suffice).  The  constants  EOFj 
and  EOFb  are  used  to  mark  the  end  of  the  path  chain. 

To  build  a  path  using  the  “Tiles”  protocol,  a  client  establishes  a  session  key  K, 
and  then  builds  a  path  by  contacting  various  T,  roles  to  add  a  tile  to  the  path.  An 
example  of  session  using  the  tiles  shown  in  Table  12  is: 


A  - 

->Ti  : 

{EOFt,  EOFb}a' 

Ti  - 

A  : 

{ni,  a,  n2}T,  {n2,  b,  n3}T,  {ra3,  c,  EOFt}t 
{n4,  b,n5}B,  {ra5,  c,  EOFb}b,  {ni,n4}*- 

A  - 

->  T2  : 

{ni,^}  K 

t2  - 

A  : 

{n6,  a,  n7}T,  { n7 ,  c,  m}T,  {n8,  c,  n9}B , 

{n 9,  a,  n4o}_B,  {ra4o,  b,  n4}#,  {n§,n9}K 

A  - 

->  T\  : 

{n&,  ng}x 

Ti  - 

->  A  : 

{nn,  a,  n12}T,  {n42,  b,  n43}r,  {n43,  c,  n6}: 
{ a  1  -i •  b,  nis}#,  {n45,  c,  ti9}b,  {ran,ni4}/i- 

This  session,  which  uses  the  tiles  in  the  order  T\.T>.T].  builds  a  top  path  abcacabc, 
and  a  bottom  path  bccabbc. 

We  assume  that  the  values  EOFj  and  EOFb,  as  well  as  the  nodes  in  the  tile 
set,  are  public  information.  An  intruder  is  able  to  attack  the  protocol  and  learn 
SECRET  if  he  can  construct  a  route  where  the  top  and  bottom  paths  are  identical, 
by  selecting  the  tiles  in  the  appropriate  sequence,  feeding  messages  into  the  protocol 
to  build  a  route,  and  then  feeding  the  route  through  the  Checker  protocol  to  cause 
it  to  broadcast  SECRET.  Note  that  selecting  the  set  of  tiles  is  equivalent  to  solving 
PCP,  so  the  protocol  is  insecure  if  the  intruder  can  solve  PCP.  In  other  words,  since 
PCP  is  undecidable,  the  secrecy  problem  for  this  protocol  class  is  undecidable. 

Note  that  is  also  possible  to  construct  a  simple  protocol  using  MSR  that  solves 
PCP,  but  uses  arbitrary  length  messages.  The  point  of  our  example  here  is  that  by 
using  nonces  we  can  construct  a  PCP  solution  using  small  messages,  where  the  size 
of  the  messages  depends  on  the  size  of  the  problem  (i.e.  the  size  of  the  tiles),  not  on 
the  size  of  the  problem  solution. 

7  Comparison  to  Other  Work 

The  MSR  formalism  is  based  on  earlier  work  first  presented  in  [Mit98,  DM99b, 
CDL+99].  The  complexity  results  for  undecidability  and  DEXP-completeness  with¬ 
out  a  disequality  test  were  first  published  in  [DLMS99],  with  the  NP-completeness 
results  presented  at  the  FMSP  workshop  talk  in  1999.  The  complexity  results  with 
disequality  test  are  presented  here  for  the  first  time. 


52 


A  complexity  case  that  has  been  studied  fairly  extensively  is  the  one  with  a 
bounded  number  of  roles  and  an  unbounded  message  size.  This  class  is  of  interest 
because  it  seems  to  be  practical  to  apply  mo  del- checking  and  exhaustive  search  tech¬ 
niques.  This  case  was  shown  to  be  decidable  in  [Hui99]  and  NP-complete  (without 
restriction  to  atomic  keys)  in  [RT01].  This  work  basically  shows  that  the  bounded 
nature  of  the  protocols  imposes  its  own  natural  limit  on  the  message  space  that  can 
be  productively  exploited  by  the  attacker. 

Additional  work  has  also  been  done  making  use  of  the  MSR  formalism  in  ar¬ 
eas  other  than  complexity  analysis.  This  includes  relating  strands  [FHG98]  to 
MSR  [CDM+03],  and  using  MSR  as  a  common  intermediate  language  for  CAPSL 
[DM99a].  A  typing  infrastructure  has  been  added  to  MSR,  based  on  the  theory  of  de¬ 
pendent  types  with  subsorting  [CerOlb],  and  this  typed  MSR  was  used  to  prove  that 
the  Dolev-Yao  intruder  can  emulate  the  actions  of  an  arbitrary  adversary  [CerOla]. 
Recent  work  used  MSR  to  formally  analyze  the  Kerberos  5  protocol,  discovering 
several  anomalies  [BCJS02], 

8  Conclusion 

In  this  paper  we  have  defined  the  formalism  for  Multiset  Rewriting  with  existent ials? 
and  shown  how  to  use  this  formalism  to  describe  security  protocols  and  the  Dolev- 
Yao  attacker  model.  We  use  this  formalism  to  analyze  the  complexity  of  the  secrecy 
problem  in  protocol  analysis,  under  various  restrictions  to  message  size,  number  of 
protocol  roles,  and  number  of  nonces. 

Protocol  analysis  is  theoretically  hard,  but  many  automated  tools  do  exist  that 
can  provide  useful  insight  into  the  problem.  These  tools  usually  are  limited  to  some 
approximation  of  the  protocol  secrecy  problem  as  we  have  defined  it  in  this  paper. 
For  instance,  tools  such  as  mo  del- checkers  [Low96,  Mea96,  MMS97,  Ros95,  Sch96] 
limit  the  number  of  roles  and  nonces,  other  tools  such  as  TAPS  [CohOO]  ignore  the 
linear  nature  (states)  of  the  protocol  roles.  These  tools  can  prove  quite  useful  in 
identifying  protocol  bugs  and  and  possible  attacks  scenarios,  though  a  model-checker 
can  only  prove  there  are  no  attacks  within  the  limits  of  its  search,  and  a  non-linear 
model  might  discover  spurious  attacks  that  need  to  be  examined  and  eliminated  by 
hand.  Symbolic  tools  like  Athena  [Son99]  don’t  limit  the  number  of  roles,  but  also 
can’t  be  guaranteed  to  terminate.  In  general,  finding  attacks  in  a  limited  case  is 
easier  than  proving  that  there  aren’t  any  attacks  in  the  general  case. 

We  have  identified  an  open  problem  for  the  complexity  of  the  secrecy  case  with 
disequality,  unbounded  roles,  and  bounded  nonces  (the  ???  box  in  Table  9).  We 
conjecture  that  the  additional  power  of  the  disequality  test  makes  this  case  unde- 
cidable.  Other  future  work  in  this  area  could  include  solving  this  open  problem,  as 
well  as  relating  MSR  to  other  protocol  analysis  formalisms  such  as  spi  calculus,  and 
applying  MSR  to  the  analysis  of  specific  protocols. 

Acknowledgements  Thanks  to  Rohit  Chadha  and  Vitaly  Shmatikov  for  their 
helpful  comments. 
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A  Horn  Clause  Turing  Machine  Reductions 

A.l  Existential  Horn  Clauses  is  Undecidable 

We  use  a  construction  based  on  axiomatizing  a  Cook’s-theorem-style  Turing  machine 
tableau,  to  prove  the  undecidability  of  existential  Horn  clauses.  The  notation  we 
use  here  to  encode  a  Turing  machine  is  similar  to  that  used  in  Section  2.4,  though 
here  we  are  representing  an  entire  Turing  machine  tableau,  and  in  Section  2.4  we 
were  representing  the  step-by-step  computation.  Because  the  tableau  is  non-linear 
in  nature,  all  facts  that  appear  in  the  Horn  clauses  must  be  true  at  all  times,  so 
there  are  necessarily  differences  between  the  two  encodings.  For  example,  we  cannot 
use  the  Curr  predicate  to  represent  the  current  state  of  the  machine,  because  the 
intruder  could  replay  an  out-of-date  fact  at  any  time.  Instead,  the  contents  of  the 
Curr  predicate  is  included  in  the  Cont  predicate,  which  includes  both  the  unique 
name  of  the  cell  in  the  tableau,  and  the  cell’s  contents. 

We  construct  a  tableau  describing  the  computation  of  a  DTM  M  on  a  given  input 
w  E  E*,  where  |i«|  =  ra,  using  a  set  of  existential  Horn  Clauses.  An  example  of  the 
tableau  we  construct  is  shown  in  Table  13.  Also,  a  nice  picture  of  a  tableau  similar 
to  the  one  we  use  appears  in  [Sip97,  page  255].  The  atomic  formula  A(bi, ...  ,bk) 
mentioned  in  the  statement  of  the  lemma  can  be  an  atomic  formula  that  is  derivable 
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by  a  rule  that  requires,  in  its  hypothesis,  that  the  Turing  machine  is  in  a  halting 
state. 

Take  any  language  iCS*.  Let  M  =  (X,  Q ,  d,  qo,  Q+)  be  a  Deterministic  Turing 
Machine  (DTM).  Here,  X  is  a  finite  alphabet  of  tape  symbols,  containing  the  special 
blank  symbol  □,  Q  is  a  finite  set  of  states,  6  :  (Q  x  X)  — >■  X  x  {L,R}  x  Q  is  the 
transition  relation,  qo  E  Q  is  the  initial  state,  and  Q+  C  Q  is  the  set  of  accepting 
states.  Without  loss  of  generality,  we  assume  a  Turing  machine  with  a  semi-infinite 
tape  that  guarantees  the  head  will  not  run  off  the  left  end  of  the  tape,  and  we 
assume  that  every  accepting  state  is  a  terminal  state. 

We  construct  a  set  of  Horn  Clauses  H(M,w)  as  follows: 

Notation  First  we  define  three  predicates  that  describe  the  contents  of  the  cells 
in  the  tableau,  and  their  relationship  to  each  other. 

Cont (x,a,q)  Cell  x  has  contents  a.  If  present,  q  means  the  tape 
head  is  in  cell  x  and  the  machine  is  in  state  q. 

Adj  (x,y)  Cell  x  is  adjacent  to  cell  y. 

Below (x,y)  Cell  y  is  below  cell  x. 

We  introduce  some  special  constants.  ceot  is  a  special  cell  name  that  labels  the 
cell  at  the  right  end  of  the  tape  on  each  row  of  the  tableau.  The  symbol  @  ^  Q  is 
a  placeholder  for  the  machine  state  in  those  cells  that  don’t  contain  the  tape  head. 
The  symbol  #  ^  X  is  used  as  the  contents  for  the  cells  at  both  ends  of  the  tape. 

Transition  clauses  For  each  transition  relation  in  <5,  we  introduce  a  clause.  For 
a  transition  that  moves  the  tape  head  to  the  left,  5(qi,  s)  =  {( qj ,  s' ,  L)},  we  have  the 
following: 

Vx,y,z,a,b.[(Adj(x,y)  A  Adj(y,2:)A 
Cont(a:,  a,  @)  A  Cont(y,  s,  q-i)  A  Cont (2,  b,  @)) 

=>  3x’,y’,z'.{{M]{x’,y')  AM]{y’,z’)A 
Belowja;,  x')  A  Below(y,y')  A  Belowjz,  z')) A 
Contja:',  a,  qj)  A  Cont(y7,  s',  @)  A  Contjz7,  b,  @))] 

Similarly,  for  a  transition  that  moves  the  tape  head  to  the  right,  d(qj,s)  = 

{( qj,s',R )}: 

Vx,y,z,a,b.[(Mj(x,y)  A  Adj(y,2:)A 
Cont(rr,  a,  @)  A  Cont(y,  s,  qi)  A  Cont (2,  6,  @)) 

=k  3a/,  y' ,  z' ,  y')  A  Adj(y',  z')A 
Belowja:,  x')  A  Below {y,y')  A  Below(^,  z'))A 
Cont(a:/,  a,  @)  A  Cont(y;,  s',  @)  A  Cont(2:/,  b,  qj))] 

Maintenance  clauses  In  addition  to  the  transition  clauses,  we  need  several  other 
clauses  that  are  used  to  construct  the  rest  of  the  tableau  that  is  not  near  the  tape 
head. 


59 


This  clause  copies  the  contents  of  the  tape  to  the  next  row  in  the  tableau,  creating 
a  new  cell  below  the  old  one,  provided  the  cell  and  its  neighbors  do  not  contain  the 
tape  head. 

Vag  y,  z,  a,  b,  c.[(Adj(x,  y)  A  Adj(y,  z)  A 
Cont(a:,  a,  @)  A  Cont(y,  6,  @)  A  Cont(z,  c,  @)) 

=>  3y'.(Below(y,  y ')  A  Cont(y',  b,  @))] 

There  is  a  tape  maintenance  clause  for  adding  a  new  cell  at  the  right  end  of  the 
tape  at  each  step: 

\/x,  y ,  a.  [Adj  (a:,  ceot)  A  Below(a:,  y)  =>  3z.Adj(y,  z)  A  Cont(^,  □,  @)  A  Adj  (2,  ceot)] 

This  rule  ensures  that  the  computation  can  never  run  off  the  right  end  of  the  tape, 
since  the  tape  head  starts  at  the  leftmost  cell,  each  step  can  only  move  the  tape 
head  to  the  right  by  at  most  one,  and  this  rule  creates  a  new  cell  at  each  step.  So  at 
the  m’th  step  of  the  computation,  there  are  always  at  least  m  +  n  cells  in  the  tape. 
The  right  end  of  the  actual  tape  is  infinite,  but  our  tableau  only  needs  to  represent 
a  finite  number  of  cells  on  each  row,  with  the  rest  of  the  cells  to  the  right  assumed 
to  contain  □. 

Another  special  maintenance  clause  is  needed  to  allow  the  left  marker  cell  to  be 
copied  down  to  each  successive  row  of  the  tableau: 

\/x,  y,  a. [Adj (ic,  y)  A  Cont(®,  #,  @)  =>  iz.Belowjx,  z)  A  Contjz,  #,  @)] 

Finally,  we  need  a  way  to  connect  together  the  cells  created  by  the  above  rules. 
This  clause  generates  the  adjacency  facts  for  cells  that  are  below  adjacent  cells. 

Vrr,  y,  x\  ?/.[Adj(x,  y)  A  Below(a:,  x')  A  Below(y,  y')  =>  Adj (x' ,y')\ 

Initialization  The  initial  state  of  the  DTM  has  the  input  w  =  W\W2  ■  ■  ■  wn  in  the 
first  n  cells  of  the  tape,  with  the  rest  of  the  infinite  tape  containing  □,  the  tape 
head  in  the  first  cell,  and  the  machine  state  qq.  In  our  construction  we  assign  the 
cells  in  the  first  row  the  names  Co,  Ci, . . . ,  cn,  cn+i,  cn+2,  and  we  include  some  special 
marker  cells  at  the  ends  of  the  active  region  of  the  tape. 

We  represent  the  top  row  of  the  tableau  by  describing  the  adjacency  of  the  cells, 
and  their  contents,  as  follows: 

{Adj(c0,ci),  Adj(ci,c2),  •  •  ■ , 

Adj(cn,  cn+i),  Adj(cn+i,  cn+2),  Adj(cn+2,  ceot), 

Cont(c0,  #,  @),  Cont(ci,  «>i,  y0),  Cont(c2,  w2,  @),  •  •  ■  , 

Cont(cn,wn,  @),  Cont(cn+i,  □,  @),  Cont(cn+2,  □,  @),  Cont(ceot,  #,  @)} 

Note:  The  2  extra  blank  cells  after  the  input  are  provided  to  ensure  correct  operation 
of  the  construction  on  an  input  of  w  =  e. 
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Termination  The  acceptance  condition  is  represented  by  a  set  of  clauses  that 
derive  the  ACCEPT  fact: 


\/x,  a.[Cont(rr,  a,  qa\)  =>  ACCEPT 
Cont(x, a, qa2)  =>  ACCEPT 


for  each  qaj  G  Q+ . 

Lemma  A.l.  H(M,w)  h  ACCEPT  if  and  only  if  machine  M  halts  in  an  accepting 
state  on  input  string  w. 

Proof.  We  first  show  that  if  machine  M  halts  in  an  accepting  state  on  input  string 
w,  then  H(M,w )  h  ACCEPT.  The  accepting  computation  of  M  can  be  represented 
by  an  accepting  tableau,  as  described  earlier  and  illustrated  in  Table  13,  where  each 
line  of  the  tableau  corresponds  to  a  configuration  of  the  accepting  computation.  If 
M  halts  in  an  accepting  state  on  input  string  w  after  /  steps,  that  means  there 
is  a  sequence  of  configurations  co5%,  ■ . .  eg,  such  that  Co  is  the  initial  configuration 
(i.e.  the  top  line  of  the  tableau),  eg  is  a  configuration  with  the  state  qg  G  Q+ ,  and 
for  each  consecutive  configuration  c, ,  cr+r ,  cl+  \  can  be  obtained  from  ct  by  applying 
some  rule  5j,  from  S. 

By  construction,  the  Initialization  clauses  ensure  that  initial  Horn  clauses  in 
H(M,w)  correspond  to  configuration  Co-  If  a  state  corresponding  to  configuration 
Cj  can  be  reached  by  applying  clauses  in  H(M,w),  then  the  state  corresponding  to 
Cj. |_i  can  be  reached  by  applying  the  Transition  clause  that  corresponds  to  the  rule 
5j,  plus  the  Maintenance  clauses.  The  Transition  clause  builds  the  cells  near  the 
tape  head,  and  the  Maintenance  clauses  build  the  other  cells  in  the  configuration, 
and  ensure  that  they  are  connected  together  properly.  These  clauses  together  can  be 
used  to  construction  a  state  that  corresponds  to  configuration  ci+  \ .  Finally,  when 
configuration  cj  is  reached  in  the  Turing  machine  tableau,  Cont(a:,  a,  qg)  will  be  true 
for  some  cell  x.  so  the  Termination  clause  can  be  applied  to  derive  ACCEPT  from 
H{M,w). 

Now  we  show  that  if  H (M,  w)  h  ACCEPT,  then  machine  M  halts  in  an  accepting 
state  on  input  string  w.  If  H(M,  w)  h  ACCEPT,  then  Cont(rr,  a,  qg)  must  be  derivable 
for  some  cell  x  and  some  state  qg  G  Q+.  By  construction,  the  initial  Horn  clauses 
in  H(M,w )  correspond  to  the  Turing  machine’s  initial  configuration,  cq.  Only  the 
Termination  clauses  can  be  used  to  derive  ACCEPT,  and  only  the  Transition  clauses 
can  be  used  to  create  new  cells  whose  contents  contain  the  tape  head  (and  thus 
the  machine  state).  So  the  sequence  of  transition  clauses  used  to  derive  ACCEPT 
corresponds  exactly  to  the  sequence  of  transition  rules  in  6  that  are  used  in  the 
accepting  computation  of  M.  □ 

Lemma  A. 2.  The  implication  problem,  for  existential  Horn  clauses  without  function 
symbols  is  undecidable.  In  particular,  there  is  no  algorithm  for  deciding  whether  a  set 
of  existential  Horn  clauses  without  function  symbols  implies  a  single  atomic  formula 
A(b\, . . . ,  6^)  without  function  symbols  or  variables. 
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(0,1V) 

# 

(i,iv) 

(2,1V) 

■ 

{n,N) 

(n  +  1,N) 

(n  +  2,  N) 

m 

(N  -1,1V) 

(IV,  N) 

# 

Table  14:  Example:  DEXP  Turing  Machine  Tableau 


Proof.  This  follows  from  Lemma  A.l.  Since  we  can  reduce  the  halting  problem  to 
the  implication  problem  for  existential  Horn  clauses  without  function  symbols,  the 
implication  problem  is  undecidable.  □ 

A. 2  Non-Existential  Horn  Clauses  is  DEXP-Hard 

£ 

We  show  that  if  a  DTM  M  halts  in  N  =  2"  steps  on  a  given  input  w,  where 
| to |  =  n.  then  M  can  be  simulated  by  a  set  of  existential- free  Horn  Clauses.  As  for 
the  unbounded  Turing  Machine  in  the  previous  section,  we  will  construct  a  tableau 
representing  the  computation.  An  example  of  such  a  tableau  is  in  Table  14. 

Take  any  language  A  in  dexp.  Let  M  =  (X,  Q,  5,  qo,  Q+)  be  a  Deterministic 

£ 

Turing  Machine  (DTM)  that  decides  A  in  N  =  2”  time  for  some  constant  t,  and 
input  size  n. 

Here,  X  is  a  finite  alphabet  of  tape  symbols,  containing  the  special  blank  symbol 
□  ,  Q  is  a  finite  set  of  states,  5  :  (Q  x  X)  — >  X  x  {L,  R}  x  Q  is  the  transition  relation, 
qo  £  Q  is  the  initial  state,  and  Q+  C  Q  is  the  set  of  accepting  states.  Without  loss 

of  generality,  we  assume  a  one-tape  Turing  machine  that  guarantees  that  a  program 

£ 

running  in  N  =  2"  time  will  not  run  off  the  end  of  the  tape,  and  we  assume  that 
every  accepting  state  is  a  terminal  state. 

£ 

We  construct  a  set  of  Horn  Clauses  H(M,w,  2"  ),  where  w  E  A  is  the  input, 
with  |'»;|  =  n,  as  follows: 

Notation  We  use  the  same  notation  as  for  the  unbounded  Turing  machine  in 
Appendix  A.l,  though  we  represent  the  cell  number  symbolically,  breaking  it  up 
into  two  parts,  which  can  be  viewed  as  representing  the  row  and  column  of  the  cell. 
The  cell  number  is  actually  composed  of  n(  + 1  binary  digits  for  the  row  and  column 
respectively,  each  bit  a  separate  argument  to  the  predicate. 

A  cell  position  is  a  pair  of  numbers,  with  each  number  represented  by  a  sequence 
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of  bits.  For  conciseness,  we  use  the  following  abbreviations: 

Cont  (p,a,q)  =  Cont  ((n,m),a,q) 

Cont ((71,771),  a,  q)  =  Cont ((x,y),a,q) 

Cont ((x,y),a,q)  =  Cont(xQ, . . .  ,xnt,y0, . . .  ,ynt,a,q) 

where  p  is  any  position  (n,  m)  and  n  and  m  are  the  numbers  represented  by  the  bit 
vectors  x  and  y.  We  use  similar  abbreviations  for  the  cell  numbers  in  the  Adj  and 
Below  predicates. 

Transition  Clauses  For  each  transition  relation  in  <5,  we  introduce  a  clause.  For 
a  transition  that  moves  the  tape  head  to  the  left,  S(q-,:,s )  =  {(qjrs',L)},  we  have  the 
following: 

\/x,x',y,y',z,z':a,b.[(M}(x,y)  A  Adj(y,2:)A 
Cont(rr,  a,  @)  A  Cont(y,  s,  qj)  A  Cont(^,  b,  @))  A 
Below(a;,  x')  A  Below(y,  y')  A  Below(z,  zJ) 

=>  Cont(a:/,  a,  qj)  A  Cont(y/,  s',  @)  A  Con\.(z' ,  b,  @)) 

And  similarly  for  a  transition  that  moves  the  tape  head  to  the  right. 

Maintenance  clauses  This  clause  copies  the  contents  of  the  tape  to  the  next  row 
in  the  tableau,  provided  the  cell  and  its  neighbors  do  not  contain  the  tape  head. 

V®,  x1,  y,  y',  z,  z',  a,  b,  c.[(Adj(rr,  y)  A  Adj(y,  z)  A 

Cont(x,  a,  @)  A  Cont(y,  b,  @)  A  Cont(;2,  c,  @))  A  Below(y,  y') 

Cont (y',b,@)) 

This  clause  copies  the  contents  of  the  marker  cells  (which  will  be  initialized  to 
be  the  the  left  and  right  margins  of  the  first  row)  to  the  cells  below. 

V®,  y.[(Below(rr,  y)  A  Cont(a:,  #,  @)  Cont (y,#,@)) 

Initialization  As  for  the  unbounded  Turing  machine,  the  initial  state  of  the  DTM 
has  the  input  w  =  W1W2  ■  ■  ■  wn  in  the  first  n  cells  of  the  tape,  with  the  rest  of  the 
infinite  tape  containing  □,  the  tape  head  in  the  first  cell,  and  the  machine  state  qo- 
As  described  above,  we  use  symbolic  tape  cell  names  of  the  form  (x,y),  to  label  the 
cells.  We  also  include  some  special  marker  cells  at  the  ends  of  the  tape. 

We  represent  the  top  row  of  the  tableau  by  describing  the  initial  contents  of  the 
tape,  with  the  input  word  first,  and  the  rest  of  the  tape  row  containing  blanks. 

{Cont((l,  0),  wi,  q0),  Cont({2,  0),  w2i  @),  •  •  • ,  Cont ((n,  0),wn,  @)} 


and 


{Cont((n  +  1,0),  □,  @),  Cont((n  +  2,  0),  □,  @), . . .  ,  Cont({2" 


1,0),  □,©)} 
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We  use  a  set  of  n1  +  1  adjacency  facts  to  describe  the  horizontal  connections 
between  the  cells  in  the  tableau.  Here  we  use  the  notation  1"  to  indicate  a  string  of 
n  l’s,  and  0n  to  indicate  a  string  of  n  0’s. 

{ 

V£,y.[Adj((£0,y),  (xl,y)] 

Vf ,  y.  [Adj  ( (£0|  y ) ,  (x  10,  y } )] 

V£,y.[Adj((£011,y),  (£100,  y})] 

Vy.[Adj((01<y),(10M\y})] 

} 

And  we  need  a  set  of  nf  +  1  belowness  facts  to  describe  the  vertical  connections 
between  the  cells. 

{ 

Vx,  y. [Below ((£,y0),  (£,yl)] 

V£,  y. [Below ((£,y01),  (£,yl0))] 

Vx,  y. [Below ((x,  yOll),  (£,  ylOO))] 

V£.[Below((£,  01"f },  (x,  10nf ))] 

} 

Finally,  we  need  to  initialize  the  contents  of  the  special  marker  cells  on  the  two 
ends  of  the  tape,  the  left  margin  and  right  margin  of  the  first  row: 

{Cont((0,0),#,@),Cont((2^,0),#,@)} 


Termination  The  acceptance  condition  is  represented  by  a  set  of  clauses  that 
derive  the  ACCEPT  fact: 

Vx,  a.[Cor\t(x,  a,  qai)  =>■  ACCEPT 
Cont(x,a,qa2)  ACCEPT 


Lemma  A. 3.  H(M,w,N)  P  ACCEPT  if  and  only  if  machine  M  accepts  the  input 

t 

string  w  of  length  n  within  N  =  2"  steps. 

Proof.  This  follows  from  an  argument  similar  to  Lemma  A.l,  though  the  cells  are 
named  symbolically  using  constants,  and  the  Turing  machine  tableau  is  slightly 
different,  as  described  above,  and  illustrated  in  Table  14.  □ 

Lemma  A. 4.  The  implication  problem,  for  Horn  clauses  without  function  symbols 
or  existentials  is  in  dexp -hard.  In  particular,  an  algorithm  for  deciding  whether 
a  set  of  existential  Horn  clauses  without  function  symbols  implies  a  single  atomic 
formula  A(b\, . . . ,  b^f)  without  function  symbols,  variables  or  existentials  runs  in  time 
exponential  in  the  size  of  the  input  formula. 
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Proof.  This  follows  from  Lemma  A. 3.  Since  we  can  reduce  the  decision  problem  for 
a  DEXP-time  Turing  machine  to  the  implication  problem  for  Horn  clauses  without 
existentials  or  function  symbols,  the  implication  problem  is  DEXP-hard.  □ 
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